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Abstract 

A central question when parallelizing evolutionary algorithms is the 
choice of the number of parallel instances. In practice optimal param- 
eter settings are often hard to find due to limited information about 
the optimization problem under consideration. We present two adaptive 
schemes for dynamically choosing the number of instances in each gen- 
eration. These schemes work in a black-box setting where no knowledge 
on the function at hand is available. Both schemes provide near-optimal 
speed-ups in terms of the parallel time while not increasing the number of 
function evaluations in an asymptotic sense, compared to upper bounds 
via the fitness-level method. It turns out that the optimization of the 
offspring population size in a (l-l-A)-EA is just a special case in this con- 
text, so our schemes and results also work for the choice of the offspring 
population size. 



1 Introduction 



Parallelization is becoming a more and more important issue for solving difficult 
optimization problems [T|. Various implementations of parallel evolutionary 
algorithms (EAs) have been applied in the past decades [TT] . 

One of the most important questions when dealing with parallel EAs is how 
to choose the number of processors such that a good speed-up is achieved in 
terms of the parallel computation time, without wasting computational effort 
in terms of the total sequential computation time. We consider a setting where 
multiple processors try to find improvements of the current best fitness in paral- 
lel. This corresponds to an island model where subpopulations evolve in parallel 
and migration is used to send copies of good individuals to other islands. Our 
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setting is greedy in a sense that we assume a complete topology on the islands; 
whenever one island finds an improvement of the current best individual in the 
system, this is immediately communicated to all other islands. 

We are interested in finding best-possible speed-ups in such a setting by 
adapting the number of islands. This should be done without increasing the 
asymptotic sequential running time. Choosing the offspring population size of a 
(1-l-A) EA turns out to be a special case in our setting, where we have A islands, 
an (1-1-1) EA on each island and a single best individual is sent to all islands. 
The offspring population size has already been investigated theoretically and 
empirically by Jansen, De Jong, and Wegener [5]. Our results apply to both 
parallel EAs and offspring populations in the (H-A) EA. 

For both -parallel EAs and the (1-l-A) EA~ we speak of the parallel optimiza- 
tion time, denoted by T^^'^, as the number of generations until the first global 
optimum is evaluated. The sequential optimization time, denoted by T^'^'^, is 
defined as the number of function evaluations until the first global optimum is 
evaluated. Note that this includes all function evaluations in the generation of 
the algorithm in which the improvement is found. In both measures we allow 
ourselves to neglect the cost of the initialization as this only adds a fixed term 
to the running times. To unify the notation for parallel EAs and offspring pop- 
ulations, we simply speak of the population size in the following; this means the 
number of islands in the island model and the offspring population size for the 
(H-A) EA, respectively. 

In previous work on the choice of the offspring population size [9] and on 
parallel spatially structured EAs with a complete topology [11] it was possi- 
ble to analytically derive asymptotically optimal population sizes for three test 
functions OneMax, LO, and Jump^,. However, it remains open whether one 
can derive an automatic way of choosing optimal population sizes. This is par- 
ticularly important with regard to problems where it might not be possible or 
worthwhile to perform an analysis. 

In this work we present adaptive schemes for choosing the population size 
and accompany these schemes by a rigorous theoretical analysis of their running 
time. Our schemes are inspired by GPU or cloud computing where it is possible 
to adjust the number of processors on-the-fly. The first scheme doubles the 
population size if the current generation fails to produce an offspring that has 
larger fitness than the current best fitness value. Once an improvement is found, 
the population size drops to 1; only the best individual or island survives. The 
second scheme tries to maintain a good population size over time; it also doubles 
the population size in unsuccessful generations and it halves the population size 
in successful generations. 

Both schemes are oblivious with respect to the function at hand and can 
therefore be applied in a black-box setting where no knowledge is available on 
the function at hand. We prove in the following that, compared to upper bounds 
via the fitness-level method, the expected sequential optimization time does not 
increase asymptotically. But for the parallel optimization time the waiting time 
for improvements on every fitness level can be replaced by their logarithms. This 
leads to a tremendous speed-up, in particular for problems where improvements 
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are hard to find. We present general upper bounds for both schemes as well 
as example applications to test functions: OneMax, LO, the class of unimodal 
functions and Jump^.. 

In our proofs we introduce new arguments on the amortized analysis of 
algorithms, which may find further applications in the analysis of stochastic 
search algorithms and adaptive mechanisms. 

The remainder of this work is structured as follows. In Section [2] we review 
previous work. Section [3] presents the algorithms and the considered population 
update schemes. In Section S] we provide technical statements that will be used 
later on in our analyses and that may also help to understand the dynamics of 
the adaptive algorithms. Section [S] then presents general upper bounds for both 
schemes, while Section |6] deals with lower bounds on expected sequential times. 
Section [7] contains a brief discussion about tailored, that is, non-oblivious pop- 
ulation update schemes. Our general theorems are applied to concrete example 
functions in Section IS] We finish with a discussion of possible extensions in 
Section [S] and conclusions in Section [TUl 

2 Previous Work 

2.1 Adaptive Population Models 

Considering adaptive numbers of islands in the island model of EAs, previous 
work is very limited. However, there are numerous results for adaptive popula- 
tion sizes in EAs. Eiben, Marchiori, and Valko [5] describe EAs with on-the-fly 
population size adjustment. They compared the performance of the different 
strategies in terms of success rate, speed, and solution quality, measured on a 
variety of fitness landscapes. The best EAs with adaptive population resizing 
outperformed traditional approaches. Typical approaches are eliminating popu- 
lation size as an explicit parameter by introducing aging and maximum lifetime 
properties for individuals [T^, the parameter- less GA (PLGA) which evolves a 
number of populations of different sizes simultaneously [7] , random variation of 
the population size [3], and competition schemes |14j . 

Schwefel [15] suggested A-adaptation first, which adapts the offspring pop- 
ulation sized during the optimization process. Herdy [B] proposed a mutative 
adaptation of A in a two- level ES, where on the upper level, called population 
level, A is treated as a variable to be optimized while on the lower level, called 
individual level, the object parameters are optimized. 

In [B] a deterministic adaptation scheme for the number of offspring A based 
on theoretical considerations on the relation between serial rates of progress 
for the actual number of offspring A, for A — 1 and for the optimal number of 
offspring is introduced. More specific, the local serial progress (i. e. progress 
per fitness function evaluation) is optimized in a (1,A) EA with respect to the 
number of offspring A. The authors prove the following structural property: the 
serial progress-rate as a function of A is either a function with exact one (local 
and global) maximum or a strictly monotonically increasing function. 
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Jansen, De Jong, and Wegener [3] further elaborate on the offspring popula- 
tion size. A thorough runtime analysis of the effects of the offspring population 
size is presented. They also suggest a simple way to dynamically adapt this pa- 
rameter and present empirical results for this scheme, but no theoretical analy- 
sis has been performed. The presented scheme doubles the offspring population 
size if the algorithm is unsuccessful to improve the currently best fitness value. 
Otherwise, it divides the current offspring population size by s, where s is the 
number of offspring with better fitness than the best fitness value so far. We 
will discuss in Section [HI how our schemes relate to their scheme and in how far 
our results can be transferred. 

2.2 Theoretical Work on Parallel EAs 

In |10j a first rigorous runtime analysis for island models has been performed 
by constructing a function where alternating phases of independent evolution 
and communication among the islands arc essential. A simple island model 
with migration finds a global optimum in polynomial time, while panmictic 
populations as well as island models without migration need exponential time, 
with very high probability. 

New methods for the running time analysis of parallel evolutionary algo- 
rithms with spatially structured populations have been presented in [llj . The 
authors generalized the well known fitness-level method, also called method of 
/-based partitions fl8| , from panmictic populations to spatially structured evo- 
lutionary algorithms with various migration topologies. These methods were 
applied to estimate the speed-up gained by parallelization in pseudo-Boolean 
optimization. The parallel and sequential optimization times were compared to 
upper bounds for a panmictic EA derived via the fitness-level method. It was 
shown that the possible speed-up for the parallel optimization time increases 
with the density of the topology, while not increasing the total number of func- 
tion evaluations, asymptotically. 

More precisely, the classical fitness level method says that when Si is a lower 
bound on the probability that one island leaves the current fitness level towards 
a better one, the expected time until this happens is at most 1/sj for a panmictic 
population. In a parallel EA with a unidirectional ring, the expected parallel 
time decreases to 0(s^/^); in other words, the waiting time can be replaced by 
its square root. For a torus graph even the third root can be used and with a 
proper choice of the number fi of islands, a speed-up of order fi is possible in 
some settings. 

Interestingly, the results from can partially be interpreted in terms of 
adaptive population sizes. The analyses are based on the numbers of individuals 
on the current best fitness level. In our upper bounds we pessimistically assume 
that only islands on the current best fitness level have a reasonable chance of 
finding better fitness levels. All worse individuals are ignored when estimating 
the waiting time for an improvement of the best fitness level. For a unidirectional 
ring, when migration happens in every generation and better individuals are 
guaranteed to win in the selection step, the number of individuals on the current 
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best fitness level increases by 1 in each generation as always a new island is taken 
over. If an improvement is found, it is pessimistically assumed that then only 
one island has made it to a new, better fitness level. 

This setting corresponds exactly to a parallel EA that in each unsuccessful 
generation acquires one new processor and to an adaptive (1+A) EA that in- 
creases A by 1 in each unsuccessful generation. Once an improvement is found, 
the population size drops to 1 as in the case of our first scheme presented here. 
The upper bounds from therefore directly transfer to additive population 
size adjustments. 

In the following we show that multiplicative adjustments of the population 
size may admit better speed-ups than additive approaches as suggested in 

3 Algorithms 

In Sections[n]and[7]we present general upper bounds via the fitness-level method. 
These results are general in the following sense. If all islands in a parallel EA 
run elitist algorithms (i. e. algorithms where the best fitness in the population 
can never decrease) and we have a lower bound on the probability of finding a 
better fitness level then this can be turned into an upper bound for the expected 
sequential and parallel running times of the parallel EA. 

We present a scheme for algorithms where this argument applies. The goal is 
to maximize some fitness function / in an arbitrary search space. An adaptation 
towards minimization is trivial. 

Algorithm 1 Elitist parallel EA with adaptive population 

1: Let fj, := 1 and initialize a single island uniformly at random 

2: for < := 1 to oo do 

3: for all 1 < i < /i in parallel do 

4: Select parents and create offspring by variation 

5: Send a copy of a fittest offspring to all other islands 

6: Create Pt-^-i such that it contains a best individual from the union of 

P( , the new offspring, and the incoming migrants 

7: /it+i := updatePopulationSize(P( , i^'^j) 

8: if /it+i > /it then create Ht+i — ^it new islands by copying existing 

9: islands 

10: if /it+i < jJLt then delete /it — /tt+i islands 



The selection of islands to be copied or removed, respectively, can be arbi- 
trary as due to the complete topology all islands always contain an offspring 
with the current best fitness. With other topologies this selection would be 
based on the fitness values of the current elitists on all islands. 

Note that we have neither specified a search space nor variation operators. 
However, in Section |6] we will discuss lower bounds that only hold in pseudo- 
Boolean optimization and for EAs that only use standard mutation (i. e. flipping 
each of n bits independently with probability 1/n) for creating new offspring. 
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The (1+A) EA can be regarded a special case where we have A islands and 
a single best individual takes over all A islands. 



Algorithm 2 (1+A) EA with adaptive population 
1: Initialize a current search point xi uniformly at random 
2: for t := 1 to oo do 
3: Create A offspring by mutation 
4: Let X* be the best offspring 

5: if f{x*) > f{xt) then xt+i ■= x* else xt+i := Xt 
6: A := updatePopulationSize({a;t}, {xt+i}) 



In Section[5]we will consider concrete example functions where the (1+A) EA 
with adaptive populations or, cquivalcntly, an island model running (1+1) EAs, 
with an adaptive population are applied. The latter was called parallel 

(1+1) EA in [iniin]. 

We now define the population update schemes considered in this work. The 
function updatePopulationSize takes the old and the new population as inputs 
and it outputs a new population size. 

In order to help finding improvements that take a long time to be found, we 
double the population size in each unsuccessful generation. As we might not 
need that many islands after a success, we reset the population size to 1. 



Algorithm 3 updatcPopulationSizc(Pt, Pt+i) (Scheme A) 

1: if max{/(a-) | x £ Pt+i} < max{/(a;) \ x £ Pt} then 
2: return 2/it 
3: else 

4: return 1 



On problems where finding improvements takes a similar amount of time, 
it might not make sense to throw away all islands at once. Therefore, in the 
following scheme we halve the population size with every successful generation. 
We will see that this docs not worsen the asymptotic performance compared to 
Scheme A. For some problems this scheme will turn out to be superior. 



Algorithm 4 updatcPopulationSize(Pt, Pt+i) (Scheme B) 

1: if max{/(a;) | x G Pt+i} < max{/(a;) | x S Pt} then 

2: return 2/it 

3: else 

4: return [/it/2j 



Our schemes for parallel EAs are applicable in large clusters where the cost 
of allocating new processors is low, compared to the computational effort spent 
within the evolutionary algorithm. Many of our results can be easily adapted 
towards algorithms that do not use migration and population size updates in 
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every generation, but only every t generations, for a parameter r £ IN called 
migration interval. This can significantly reduce the costs for allocating and 
deallocating new processors. Details can be found at the end of Section [51 

An algorithm using Scheme B can be implemented in a decentralized way as 
follows, where we assume that each island runs on a distinct processor. Assume 
all processors are synchronized, i.e., they share a common timer. All proces- 
sors have knowledge on the current best fitness level and they inform all other 
processors by sending messages in case they find a better fitness level. This 
message contains genetic material that is taken over by other processors so that 
all processors work on the current best fitness level. 

In the adaptive scheme, if after one generation no message has been received, 
i. e., no processor has found a better fitness level, each processor activates a new 
processor as follows. Each processor maintains a unique ID. The first processor 
has an ID that simply consists of an empty bit string. Each time a processor 
activates a new processor, it copies its current population and its current ID to 
the new processor. Then it appends a 0-bit to its ID while the new processor 
appends a 1-bit to its ID. At the end, all processors have enlarged their IDs by a 
single bit. When an improvement has been found, all processors first take over 
the genetic material in the messages that are passed. Then all processors whose 
ID ends with a 1-bit shut down. All other processors remove the last bit from 
their IDs. It is easy to see that with this mechanism all processors will always 
have pairwise distinct IDs and no central control is needed to acquire and shut 
down processors. 

4 Tail Bounds and Expectations 

In preparation for upcoming running time analyses we first prove tail bounds for 
the parallel optimization times in a setting where we are waiting for a specific 
event to happen. This, along with bounds on the expected parallel and sequen- 
tial waiting times, will prove useful later on. The tail bounds also indicate that 
the population will not grow too large. 

In the remainder of this paper we abbreviate max{a;, 0} by (a;)^. 

Lemma 1. Assume starting with 2^ islands for some k G IN^o o.nd doubling the 
number of islands in each generation. Let T^'^^^p) denote the random parallel 
time until the first island encounters an event that occurs in each generation 
with probability p. Then for every a £ INq 

1. Prirrip) > (riog(l/p)l - fc)++ a + l) < exp(-2"), 

2. Pr{Tl''''{j)) < log(l/p) - fc - a) < 2 • 2"", 

3. log(l/p) - fc - 3 < Eirrip)) < (log(l/p) - fc)+ + 2, 
I max{l/p,2'^} < E{Tp{p)) < 2/p+2^ - 1. 
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Each inequality remains valid if p is replaced by a pessimistic estimation of p 
(i. e. either an upper bound or a lower bound). 

Proof. The condition T^^^{p) > ([log(l/p)] — fc)'''+a + l requires that the event 
does not happen on any island in this time period. The number of trials in the 
last generation is at least 2^^°s{i/p}]+a > i/p . 2" for all k e Kq. Hence 

pr (rrip) > (riog(i/p)i - fc)+ + a + 1) < (1 

< exp(-2") . 

For the second statement we assume k < log(l/p) — a as otherwise the claim 
is trivial. A necessary condition for T^^''{p) < log(l/p) — fc — a is that the event 
does happen at least once within in the first log(l/p) — k — a generations. This 
corresponds to at most ^i°ga/p)-a 2^-1 < 2i°g(i/p)-" = i/p. 2"" trials. If 
p > 1/2 the claim is trivial as either the probability bound on the right-hand 
side is at least 1 or the time bound is negative, hence we assume p < 1/2. 
Observing that then l/p ■ < 2{l/p — 1) • 2^", the considered probability is 
bounded by 

1 - (1 - p)2(i/p-i)-2-° < I _ cxp(-2 • 2-") 

< 1 - (1 - 2 • 2"") = 2 • 2~" . 

To bound the expectation we observe that the first statement implies 
Pr(TP'"'(p) > (log(l/p)-fc)+ + a + 2) < exp(-2"). Since TP^' is non- 
negative, we have 

E(rrb)) = ^Pr(rr(p)>t) 

t=l 

< (log(l/p) -k)+ + l 

00 

+ ^ Pr (rr ip) > (iog(i/p) - k)+ + « + 2) 

00 

< (log(l/p) -k)+ + l + Y, exp(-2") 

< (log(l/p) - k)+ + 2 

as the last sum is less than 1. For the lower bound we use that the second 
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statement implies Pr (T > log(l/j9) — k — a)>l — 2-2 ^. Hence 

oo 

E(Tr(p)) = ^Pr(rr(p)>t) 
t=l 

log(l/p)-fc-l 

> Pr(rr(p)>log(l/p)-fc-a) 

log(l/p)-/c-l 

log(l/p)-fc-2 

- log(l/p) - fc - 2 - ^ 2-" 

Q = l 

> log(l/p) - fc - 3 . 

For the fourth statement consider the islands one-by-one, according to some 
arbitrary ordering. Let T{p) be the random number of sequential trials until 
an event with probability p happens. It is well known that E(T(p)) = 
Obviously T^°'^(p) > T{p) since the sequential time has to account for all islands 
that are active in one generation. This proves E {T^°'^{p)) > Fi{T{p)) > 1/p. 
The second lower bound 2*^ is obvious as at least one generation is needed for a 
success. 

For the upper bound observe that T^°'^{p) = 2*^ in case T{p) < 2^ and 

T"^\V) = Y!i=u^' in case Etl 2^ < T(p) < 2^ Together, we get that 

T^\p) < max{2r(p),2'=} < 2T(p) + 2'^-l, hence E (Tf'i(p)) < 2/p-f 2*^-1. □ 

The presented tail bounds indicate that the population typically does not 
grow too large. The probability that the number of generations exceeds the 
expectation by an additive value of a -|- 1 is even an inverse doubly exponen- 
tial function. The following provides a more handy statement in terms of the 
population size. It follows immediately from Lemma [T] 

Corollary 1. Consider the setting described in Lemma[l\ For every (3 > 1, 
f3 a power of 2, the probability that while waiting for the event to happen the 
population size exceeds max{2''^"'^, 4/p} ■ P is at most exp(— /3). 

One conclusion from these findings is that our schemes can be applied in 
practice without risking an overly large blowup of the population size. We now 
turn to performance guarantees in terms of expected parallel and sequential 
running times. 

5 Upper Bounds via Fitness Levels 

The following results are based on the fitness-level method or method of /-based 
partitions. This method is well known for proving upper bounds for algorithms 
that do not accept worsenings of the population. Consider a partition of the 
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search space into sets Ai, . . . , Am where for all 1 < i < m~l all search points in 
Ai are strictly worse than all search points in Ai+i and A,„ contains all global 
optima. If each set Ai contains only a single fitness value then the partition is 
called a canonic partition. 

If Si is a lower bound on the probability of creating a search point in Ai^i U 
• • ■ U Ajn , provided the current best search point is in Ai , then the expected 
optimization time is bounded from above by 



where Pr (Ai) abbreviates the probability that the best search point after ini- 
tialization is in Ai. The reason for this bound is that the expected time until 
Ai is left towards a higher fitness- level set is at most 1 / Si and each fitness level, 
starting from the initial one, has to be left at most once. Note that we can al- 
ways simplify the above bound by pessimistically assuming that the population 
is initialized in Ai. This removes the term "X^ta^ (^j) '" ^^^'^ O'^ly leaves 
Sjli^ 1/sj. This way of simplifying upper bounds can be used for all results 
presented hereinafter. 

The fitness-level method yields good upper bounds in many cases. This 
includes situations where an evolutionary algorithm typically moves through 
increasing fitness levels, without skipping too many levels [TB]. It only gives 
crude upper bounds in case values Si are dominated by search points from 
which the probability of leaving Ai is much lower than for other search points 
in Ai or if there are levels with difficult local optima (i.e. large values 1/si) 
that are only reached with a small probability. 

Using the expectation bounds from Section 2] we now show the following 
result. The main implication is that for both schemes, A and B, in the upper 
bound for the expected parallel time the expected sequential waiting time is re- 
placed by its logarithm. In addition, compared to the fully serialized algorithm, 
the expected sequential time does not increase asymptotically, and with respect 
to the upper bound gained by /-based partitions. 

In the remainder of the paper we denote with T^^" and T^°'^, x G {A, B} the 
parallel time and the sequential time for the Schemes A and B, respectively. 

Theorem 1. Given an f -based partition Ai, . . . , Am, 




m—1 




i=l 



If the partition is canonic then also 



m-l 



m 



-1 





i=l j=i 
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The reason for the constant 2 in log(2/sj) is to ensure that the term does 
not become smaller than 1; with a constant 1 the value Sj = 1 would even lead 
to a summand log(l/sj) ~ 0. 

Proof. Wc only need to prove asymptotic bounds on the conditional expecta- 
tions when starting in A,, with a common constant hidden in all 0-tcrms. The 
law of total expectation then implies the claim. 

For Scheme A we apply Lemma [1] with fc = 0. This yields that the expected 
sequential time for leaving the current fitness level Aj towards Aj+i U • • • U 
Am is at most 2/sj and the expected parallel time is at most log(l/sj) + 2 < 
21og(2/sj). The expected sequential time is hence bounded by 2^"^^^l/sj 
and the expected parallel time is at most 2 X^jl^^ logl^/sj)- D 

We prove a similar upper bound for Scheme B using arguments from the 
amortized analysis of algorithms [21 Chapter 17]. Amortized analysis is used to 
derive statements on the average running time of an operation or to estimate the 
total costs of a sequence of operations. It is especially useful if some operations 
may be far more costly than others and if expensive operations imply that 
many other operations will be cheap. The basic idea of the so-called accounting 
method is to let all operations pay for the costs of their execution. Operations are 
allowed to pay excess amounts of money to fictional accounts. Other operations 
can then tap this pool of money to pay for their costs. As long as no account 
becomes overdrawn, the total costs of all operations is bounded by the total 
amount of money that has been paid or deposited. 

Theorem 2. Given an f -based partition Ai, . . . , Am, 

m—l m — 1 

EiTP)<Y,Pr{A,)-3Y,-. 

i—1 j—i ^ 

If the partition is canonic then also 

ni—l m — 1 

i?(rP-)< ^Pr(AO- 4 5] log 

i=l j=i 

Proof. We use the accounting method as follows to bound the expected se- 
quential optimization time of B. Assume the algorithm being on level j with 
a population size of 2'^. If the current generation passes without leaving the 
current fitness level, wc pay 2^ to cover the costs for the sequential time in 
this generation. In addition, wc pay another 2^ to a fictional bank account. In 
case the generation is successful in leaving Aj and the previous generation was 
unsuccessful, we just pay 2^^ and do not make a deposit. In case the current gen- 
eration is successful and the last unsuccessful generation was on fitness level j, 
we withdraw 2*° from the bank account to pay for the current generation. In 
other words, the current generation is for free. This way, if there is a sequence 
of successful generations after an unsuccessful one on level j all but the first 
successful generations are for free. 
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Let us verify that the bank account cannot be overdrawn. The basic ar- 
gument is that, whenever the population size is decreased from, say, 2*^+^ to 
2^^ then there must be a previous generation where the population size was in- 
creased from 2^ to 2*^+-'^. It is easy to see that associating a decrease with the 
latest increase gives an injective mapping. In simpler terms, the latest genera- 
tion that has increased the population size from 2^ to 2*^+^ has already paid for 
the current decrease to 2^ . 

When in the upper bound for A fitness level i takes sequential time 1 + 2 + 

• . • + 2^^ = 2''+i - 1 then for B the total costs paid are 2(1 + 2 H V 2^-^) + 2^ 

as a successful generation does not make a deposit to the bank account. The 
total costs equal 2''+^ -2-1-2'' < 3/2-(2''+^ -1). In consequence, the total costs 
for Scheme B are at most 3/2 the costs for A in A's upper bound. This proves 
the claimed upper bound for B. 

By the very same argument an upper bound for the expected parallel time 
for B follows. Instead of paying 2^ and maybe making a deposit of 2*^, we always 
pay 1 and always make a deposit of 1. When withdrawing money, we always 
withdraw 1. This proves that also E (Tg^"^) is at most twice the corresponding 
upper bound for Scheme A. □ 

The argument in the above proof can also be used for proving a general upper 
bound for the expected parallel optimization time for B. When paying costs 2 
for each fitness level, this pays for the successful generation with a population 
size of, say, 2'^ and for one future generation where the population size might 
have to be doubled to reach 2^ again. 

Imagine the sequence of population sizes over time and then delete all el- 
ements where the population size has decreased, including the associated gen- 
eration where the population size was increased beforehand. In the remaining 
sequence the population size continually increases until, assuming a global op- 
timum has not been found yet, after n log n generations a population size of 
at least n" is reached. In this case the probability of creating a global opti- 
mum by mutation is at least (1 — n"")" w 1/e as the probability of hitting 
any specific target point in one mutation is at least n~". The expected num- 
ber of generations until this happens is clearly 0(1). We have thus shown the 
following. 

Corollary 2. For every junction with m function values we have E{T^^^) < 
2m + nlogn + 0(1). 

This bound is asymptotically tight, for instance, for long path problems [U 
I13j . So, the TO-term is, in general, necessary. 

When comparing A and B with respect to the expected parallel time, we 
expect B to perform better if the fitness levels have a similar degree of difficulty. 
This implies that there is a certain target level for the population size. Note, 
however, that such a target level does not exist in case the Si-values are dissim- 
ilar. In the case of similar Si-values A might be forced to spend time doubling 
the population size for each fitness level until the target level has been reached. 
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This waiting time is reflected by the log(2/sj)-ternis in Theorem [TJ The fol- 
lowing upper bound on B shows that these log-terms can be avoided to some 
extent. In the special yet rather common situation that improvements become 
harder with each fitness level, only the biggest such log-term is needed. 

Theorem 3. Given a canonical f -based partition Ai,...,Am, E{T^^^) is 
hounded by 

m-l / 

Pr(A,)- 3(m-i-l)+log 

// additionally si > S2 > ■ ■ ■ > Sm-i then the bound simplifies to 

Y^^Pr{A,) -(^{m-i -I) + log (^-^y^ . 

Proof. The second claim immediately follows from the first one as the log-terms 
form a telescoping sum. 

For the first bound we again use arguments from amortized analysis. By 
Lemma[T]if the current population size is 2'^ then the expected number of genera- 
tions until an improvement from level i happens is at most (log(l/si) — k)^ + 2. 
This is a bound of 2 if fc > log(l/si). We perform a so-called aggregate analysis 
to estimate the total cost on all fitness levels. These costs arc attributed to 
difi^erent sources. Summing up the costs for all sources will yield a bound on 
the total costs and hence on Tg^"^. 

In the first generation the fitness level i* the algorithm starts on pays 
log(l/si*) to the global bank account. Afterwards costs arc assigned as fol- 
lows. Consider a generation on fitness level i with a population size of 2^ . 

• If the current generation is successful, we charge cost 2 to the fitness level; 
cost 1 pays for the effort in the generation and cost 1 is deposited on the 
bank account. In addition, each fitness level j that is skipped or reached 
during this improvement pays (log(l/.Sj) — log(l/sj_i))^ as a deposit on 
the bank account. Note that this amount is non- negative and it may be 
fractional. 

• If A: > log(l/ Si) and the current generation is unsuccessful we charge cost 1 
to the fitness level. 

• li k < log(l/si) and the current generation is unsuccessful we withdraw 
cost 1 from our bank account. 

By Lemma [T] the expected cost charged to fitness level i in unsuccessful genera- 
tions (i.e., not counting the last successful generation) is at most 1. Assuming 
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that the bank account is never overdrawn, the overall expected cost for fitness 
level i is at most 1 + 2 + (log(l/sj) — log(l/sj„i))'''. Adding the costs for the 
initial fitness level yields the claimed bound. 

We use the so-called potential method to show that the bank account is never 
overdrawn. Our claim is that at any point of time there is enough money on 
the bank account to cover the costs of increasing the current population size to 
at least 2'°s(i/*3) where j is the current fitness level. We construct a potential 
function indicating the excess money on the bank account and show that the 
potential is always non-negative. 

Let jjLt denote the population size in generation t and £t be the (random) 
fitness level in generation t. By bt we denote the account balance on the bank 
account. We prove by induction that 

h > (log(l/s,J-log(Ait))+ • 

As this bound is always positive, this implies that the account is never over- 
drawn. After the initial fitness level has made its deposit we have bi := 
log(l/s£j — 0. Assume by induction that the bound holds for bt. 

If generation t is unsuccessful and log(^t) > \og{l/si^) then the population 
size is doubled at no cost for the bank account. As by induction 6t > we have 
bt+i = fe* > = (log(l/(5,J) - log(Ait+i))+. 

If generation t is unsuccessful and log(/it) < log(l/s^J then the algorithm 
doubles its population size and withdraws 1 from the bank account. As bt is 
positive and log(/it_|-i) log(/it) + 1, we have 

bt+i 6t - 1 log(l/s£j - logifit) - 1 = log(l/s<;J - log(^t+i). 

If generation t is successful and the current fitness level increases from i to j > i 
the account balance is increased by 

1+ ^ (log(lK)-log(lK_i))+ 

a— 2+1 

> 1 + (log(l/s,) - log(l/s,))+ . 

This implies 

bt+i >bt + l + (log(l/s,) - log(l/s,))+ 

> (log(l/s,) - log(Mt))+ + 1 - log(l/s.) - log(Mt+i) 

> (log(l/,s,) - log(Ait))^ + 1 

> (log(l/s,)-log(Ait+i))+ . □ 

The upper bounds in this section can be easily adapted towards parallel 
EAs that do not perform migration and population size adaptation in every 
generation, but only every r generations, for a migration interval r G W. Instead 
of considering the probability of leaving a fitness level in one generation, we 
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simply consider the probability of leaving a fitness level in r generations. This 
is done by considering := 1 — (1 — s,)"^ instead of Si. The resulting time 
bounds, based on s'j^, . . . , Sm_i, arc then with respect to the number of periods 
of r generations. To get bounds on our original measures of time, we just 
multiply all bounds by a factor of r. 

6 Lower Bounds 

In order to prove lower bounds for the expected sequential time we make use of 
recent results by Sudholt [12]. He presented a new lower-bound method based 
on fitness-level arguments. If it is unlikely that many fitness levels are skipped 
when leaving the current fitness-level set then good lower bounds can be shown. 

The lower bound applies to every algorithm A in pseudo-Boolean optimiza- 
tion that only uses standard mutations (i. e. flipping each bit independently with 
probability 1/n) to create new offspring. Such an EA is called a mutation-based 
EA. More precisely, every mutation-based EA A works as follows. 

First, A creates /i search points xi, . . . , cc^ uniformly at random. Then it re- 
peats the following loop. A counter t counts the number of function evaluations; 
after initialization wc have t = fi. In one iteration of the loop the algorithm 
first selects one out of all search points xi, . . . ,Xt that have been created so far. 
This decision is based on the fitness values /(xi), . . . , f{xt) and, possibly, also 
the time index t. It performs a standard mutation of this search point, creating 
an offspring xt+i- 

To make this work self-contained, we cite (a slightly simplified version of) 
the result here. The performance measure considered is the number of function 
evaluations, which one can assume to coincide with the number of mutations. 

Theorem 4 f[16|). Consider a partition of the search space into non-empty sets 
Ai, . . . , Am such that only Am contains global optima. For a mutation-based EA 
A we say that A is in Ai or on level i if the best individual created so far is 
in Ai . Let the probability of traversing from level i to level j in one mutation 
be at most Ui ■ jij and li-j ^ ^- Assume that for all j > i and some 

< X < 1 «i holds 7ij- > xJ2T=j ^i-k- Then the expected number of function 
evaluations of A on f is at least 



All population update schemes are compatible with this framework; every 
parallel mutation-based EA using an arbitrary population update scheme is still 
a mutation-based EA. Offspring creations are performed in parallel in our al- 
gorithms, but one can imagine these operations to be performed sequentially. 
Since the selection can be based on the time index t it is easy to exclude that 
offspring created in the current generation are used as parents ahead of time. 
By storing knowledge on the times when each island has been active and also 
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recording migrations, this information can also be used to mimic the popula- 
tion management mechanism and to ensure that only search points from the 
currently active island are chosen as parents. There is one caveat: the parent 
selection mechanism in |16| does not account for possibly randomized decisions 
made during migration. However, the proof of Theorem 2] goes through in case 
additional knowledge is used. 

Definition 1. Call an f -based partition Ai, . . . , Am (asymptotically) tight for 
an algorithm A if there exist constants c > 1 > x > and values "fi_j such that 
for each population in Ai the following holds. 

1. The probability of generating a population in Ai^i U • • • U Am in one mu- 
tation is at least Si. 

2. The probability of generating a population in Aj in one mutation, j > i, 
is at most c ■ Si ■ 

3. For the jij -values it holds that X]JLi+i 7i-j ~ 1 '^'^'^ lij — xJ2T=j^i,k 
for all i < j ■ 

Tight /-based partitions imply that the standard upper bound by /-based 
partitions [T5] is asymptotically tight. This holds for all elitist mutation-based 
algorithms, that is, mutation-based algorithms where the best fitness value in 
the population can never decrease. 

Theorem 5. Consider an algorithm A with an arbitrary population update 
strategy that only uses standard mutations for creating new offspring. Civen a 
tight f -based partition Ai, . . . , Am for a function f , we have 

(7n — 1 m — 1 \ 

Y.PriA.,)-J2-\ . 
i=l j=i '■' J 

Proof. The lower bound on E [T^'^'i) follows by a direct application of TheoremH) 
We already discussed that this theorem applies to all algorithms considered in 
this work. Setting Uj := csj for all 1 < j < m, c and x being as in Definition [1] 
Theorem |4] implies 

m — 1 m — 1 

E (T*^^*!) > V Pr (A,) . ^ y — . 

C So 

4=1 j=l ■> 

As both, X and c, are constants, this implies the claim. □ 

This lower bound shows that for tight /-based partitions both our population 
update schemes produce asymptotically optimal results in terms of the expected 
sequential optimization time. 
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7 Non-oblivious Update Schemes 

Wc also briefly discuss update schemes that are tailored towards particular 
functions, in order to judge the performance of our oblivious update schemes. 

Non-oblivious population update schemes may allow for smaller upper 
bounds for the expected parallel time than the ones seen so far. When the 
population update scheme has complete knowledge on the function / and the 
/-based partition, an upper bound can be shown where each fitness level con- 
tributes only a constant to the expected parallel time. By T^'^'^ and T^^'^ we 
denote the sequential and parallel times of the considered non-oblivious scheme. 

Theorem 6. Given an arbitrary f -based partition Ai, . . . , there is a tai- 
lored population update scheme for which 



In particular, EiTP^"") = 0(m). 

Proof. The update scheme chooses to use [l/sj] islands if the algorithm is in 
Ai . Then the probability of finding an improvement in one generation is at least 
1 — (1 — Si)^/'*' > 1 — 1/e. The expected parallel time until this happens is at 
most e/(e — 1) and so the expected sequential time is at most e/(e — 1) ■ [l/sj] < 
2e/(e — 1) • 1/si. Summing up these expectations for all fitness levels from i 
to m — 1 proves the two bounds. □ 

In some situations it is possible to design schemes that perform even better 
than the above bound suggests. For instance, for trap functions the best strategy 
would be to use a very large population in the first generation so that the 
optimum is found with high probability, and before the algorithm is tricked to 
increasing the distance to the global optimum. 

8 Bounds for Example Functions 

The previous bounds all applied in a very general context, with arbitrary fit- 
ness functions. We also give results for selected example functions to estimate 
possible speed-ups in more concrete settings. 

Wc consider the set of example functions and function classes that has al- 
ready been investigated in jll) . The goal is the maximization of a pseudo- 
Boolean function /: {0,1}" R. For a search point x G {0,1}" write 
then OneMax(.T) := X]r=i counts the number of ones in x and 
LO(a;) := 117=1 counts the number of leading ones in x. A function 




and 
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is called unimodal if every non-optimal search point has a Hamming neighbor 
(i.e., a point with Hamming distance 1 to it) with strictly larger fitness. For 
1 < k < n we also consider 

Jumpfe := ^ + ^^=1 x^<'^^kol■ x^r , 

I '^i^iil ~ Xi) otherwise . 

This function has been introduced by Droste, Jansen, and Wegener [J] as a 
function with tunable difficulty as evolutionary algorithms typically have to 
perform a jump to overcome a gap by flipping k specific bits. 

For these functions we obtain bounds for T'^^'^ and T^'^'^ as summarized in 
Table [TJ The lower bounds for E (T'"^'^) on OneMax and LO follow directly from 
[IB] for all schemes. 





Scheme 




E (TP'''') 


OneMax 


A 


0(71 log n) 


0{n log 71) 




B 


0(71 log n) 


0(n) 




non-oblivious 


0(71 log n) 


0{n) 


LO 


A 




Q{n log n) 




B 




0{n) 




non-oblivious 




0{n) 


unimodal / 


A 


0{dn) 


O(dlogn) 


with d /-values 


B 


0{dn) 


0(rf-|-log n) 




non-oblivious 


0(dn) 


0{d) 


Jumpfe 


A 


0{n') 


0{n log n) 


with k > 2 


B 


0{n^) 


0{n + k log n) 




non-oblivious 


0{n'^) 


0{n) 



Table 1: Asymptotic bounds for expected parallel running times E (T?^'') and 
expected sequential running times E (T^'^'^) for the parallel (l~f 1) EA and the 
(l-fA) EA with adaptive population models. 



Theorem 7. For the parallel (1+1) EA and the (l+\) EA with adaptive popu- 
lation models the upper bounds for E[T^'^'^) and E{T^^'-') hold as given in Table 

m 

Proof. The upper bounds for Scheme A follow from Theorem [TJ for Scheme B 
from Theorems [2] and [3] and for the non-oblivious scheme from Theorem |6l 
Starting pessimistically from the first fitness level, the following bounds hold: 

• For OneMax we are using the canonical /-based partition Ai := {x 
OneMax(a;) = i} and the corresponding success probabilities Si > {n ~ 
i)ln ■ (1 - l/n)"-i > (n - z)/(en). Hence, E{T'^^') < 2ErJi' log(|!^) < 
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2nlog(2en) = O(nlogn), 

n — 1 w n—1 



en 



Si — ' n — t 
n ^ 

2en\^ - = 2en ■ [(Inn) + 1] 



i=l 



-B(rP") < (3(n - 2) + log(2en)) = 0{n) and E{TP) < Sen ■ [(Inn) + 1], 
E{TSo ) = 0{n) and £;(T,^S'i) = O(nlogn). 

• For LO wc arc using the canonical /-based partition Ai :~ {x \ LO(a;) ~ i} 
and the corresponding success probabilities Si > 1/n ■ (1 — l/n)"~^ > 
l/(en). Hence, i;(TX") < 2 ^"roMog(2en) = 2nlog(2en) = O(nlogn), 

71 — 1 ^ 7J — 1 

.2 



_ en = 2en 

Si ^ — ' 

i=0 i=0 

E{T^n < (3(n - 2) + log(en)) = 0(n), E{TP) < Sen\ E{TP-^-) = 0{n) 
and E{T^^l'i) = 0{n^). 

For unimodal functions with d function values, w. 1. o. g. {1, . . . , d}, we are 
using corresponding success probabilities Si > I /(en). Hence, E{T^'^^) < 
2 J2^Zl log(2en) < 2dlog(2en) = 0(dn), 

1 ^ (i-l 



E{TP) <2^-<2^en = 2edn 



t—i i—i 

i;(rP") < 3(d - 2) + log(en) = 0(d + logn), E{TP) = 3edn, E{TP^') = 
old) and E(T;'^11) = 0{dn). 

• For Jumpj. functions with k > 2 and all individuals having neither n — k 
nor n 1-bits an improvement is found by either increasing or decreasing 
the number of 1-bits. This corresponds to optimizing OncMax. In order 
to improve a solution with n — k 1-bits a specific bit string with Hamming 
distance k has to be created, which has probability Sn-k at least 

n J V'^/ V IT- J cn* 

Hence, E{T^') < 0{n\ogn) + 21og(en'') < O(nlogn) + 2fclog(en) = 
C'(nlogn), E{TP) < 0(n'^'), EiT^""') < 0(n) + fc log(en) = 0(n+fclogn), 
EiT^""^) < 0{n''), -B(rP„^') = 0{n) and £;(T,^S'5) = 0{n''). □ 

It can be seen from Table [T] that both our schemes lead to significant speed- 
ups in the considered settings. The speed-ups increase with the difficulty of the 
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function. This becomes obvious when comparing the results on OneMax and 
LO and it is even more visible for Jumpj.. 

The upper bounds for E {T^'^'^) are always asymptotically lower than those 
for E {T^'^^), except for Jump;j. with k = Q{n). However, without corresponding 
lower bounds we cannot say whether this is due to differences in the real run- 
ning times or whether we simply proved tighter guarantees for B. We therefore 
consider the function LO in more detail and prove a lower bound for A. This 
demonstrates that Scheme B can be asymptotically better than Scheme A on a 
concrete problem. 

Theorem 8. For the parallel (1+1) EA and the (1+)^) EA with adaptive pop- 
ulation models on LO we have E{Tp^^) = ^l{n\ogn). 

Proof. We consider a pessimistic setting (pessimistic for proving a lower bound) 
where an improvement has probability exactly 1/n. This ignores that all lead- 
ing ones have to be conserved in order to increase the best LO-value. We show 
that with probability ri(l) at least n/30 improvements are needed in this set- 
ting. As by Lemma [T] the expected waiting time for an improvement is at least 
max{0, (logn) — 3}, the conditional expected parallel time is r2(nlogn). By the 
law of total expectation, also the unconditional expected parallel time is then 
r2(nlogri). 

Let us bound the expected increase in the number of leading ones on one 
fitness level. Let Tf^^ denote the random number of generations until the best 
fitness increases when the algorithm is on fitness level i. By the law of total 
expectation the expected increase in the best fitness in this generation equals 

CJO 

^ Pr (7;P"'- = t) . E (LO-increase | Tf^' = t) . (1) 
t=i 

The expected increase in the number of leading ones can be estimated as follows. 
With Tf^^ = t the number of mutations in the successful generation is 2*~^. Let 
/ denote the number of mutations that increase the current best LO-value. A 
well-known property of LO is that when the current best fitness is i then the 
bits at positions i -I- 2, . . . , n are uniform. Bits that form part of the leading ones 
after an improvement are called free riders. The probability of having k free 
riders is thus 2"^^ (unless the end of the bit string is reached) and the expected 
number of free riders is at most X]fc°=o ~ ^■ 

The uniformity of "random" bits at positions i-f 2, . . . , n holds after any spe- 
cific number of mutations and in particular after the mutations in generation 
Tf '"^^ have been performed. However, when looking at multiple improvements, 
the free-rider events are not necessarily independent as the "random" bits are 
very likely to be correlated. The following reasoning avoids these possible de- 
pendencies. We consider the improvements in generation Tf^^ one-by-one. If 
Fi denotes the random number of free riders gained in the first improvement, 
when considering the second improvement the bits at positions i -I- 3 -I- i^i, . . . , n 
are still uniform. In some sense, we give away the free riders from a fitness im- 
provements for free for all following improvements. This leads to an estimation 
of 1 -(- i^i for the gain in the number of leading ones. 
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Iterating this argument, the expected total number of leading ones gained is 
thus bounded by 21, the expectation being taken for the randomness of free rid- 
ers. Also considering the expectation for the random number of improvements 
yields the bound 2E (/ | / > 1) as / has been defined with respect to the last 
(i.e. successful) generation. We also observe E (/ | / > 1) < 1-|-E(/) < 1+2*^/12. 
Plugging this into Equation ((!]) yields 

oo 

^Pr = t) ■ (2 + 2*+Vn) 

QC 

= 2 + 2 ^ Pr {Tf^' = t + 1) • 2*+Vn 

oo 

<2 + 2^Pr(7;P"''>i)-2*+V7i 

[log "1 oo 

<2 + 2 ^ 2*+Vn + 2 Pr (T^r'' > • 2*+Vn . 

t=0 t=[logn]+l 

The first sum is at most 16. Using Lemma [1] to estimate the second sum, we 
arrive at the lower bound 

oo 

18 + 2^Pr(TP^' > [logn] + a + 1) • 2ri°8"l+"+Vn 

Q = 

oo 

< 18 + 2^exp(2-")-2ri°g"l+"+7n 

Q = 

OO 

< 18 + 16-^cxp(2^")-2" 

< 29.8 . 

With probability 1/2 the algorithm starts with no leading ones, independently 
from all following events. The expected number of leading ones after n/30 
improvements is at most 29.8/30 • n. By Markov's inequality the probability of 
having created n leading ones is thus at most 29.8/30 and so with probability 
1/2 • 0.2/30 = ri(l) having n/30 improvements is not enough to find a global 
optimum. □ 

9 Generalizations & Extensions 

We finally discuss generalizations and extensions of our results. 

One interesting question is in how far our results change if the population is 
not doubled or halved, but instead multiplied or divided by some other value b > 
1. We believe that then the results would change as follows. With some potential 
adjustments to constant factors, the log-terms in the parallel optimization times 
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in Theorems [U [5] and [3] would have to be replaced by log^. For the sequential 
optimization times stated in these theorems one would need to multiply these 
bounds by 6/2. This means that a larger b would further decrease the parallel 
optimization times at the expense of a larger sequential optimization time. 

Our analyses can also be transferred towards the adaptive scheme presented 
by Jansen, De Jong, and Wegener [H]. Recall that in their scheme the population 
size is divided by the number of successes. In case of one success the population 
size remains unchanged. This only affects the constant factors in our upper 
bounds. When the number of successes is large, the population size might 
decrease quickly. In most cases, however, the number of successes will be rather 
small; for instance, the lower bound for LO, Theorem [51 has shown that the 
expected number of successes in a successful generation is constant. However, 
it might be possible that after a difficult fitness level an easier fitness level is 
reached and then the number of successes might be much higher. In an extreme 
case their scheme can decrease the population size like Scheme A. In some sense, 
their scheme is somewhat "in between" A and B. With a slight adaptation of the 
constants, the upper bound for Scheme A from Theorem [T] can be transferred 
to their scheme. 

Another extension of the results above is towards maximum population sizes. 
Although we have argued in Section 0] that the population size does not blow up 
too much, in practice the maximum number of processors might be limited. The 
following theorem about E{T^^'^) for maximum population sizes can be proven 
by applying arguments from jll) . 

Theorem 9. The expected parallel optimization time of Scheme A for a maxi- 
mum population size //max is bounded by 



Proof. We pessimistically estimate the expected parallel time by the time until 
the population consists of /Xmax islands plus the expected optimization time 
if Mmax islands are available. The time until //max islands are involved is 
log //max on one fitness level. Hence, summing up all levels pessimistically gives 
mlog/Zniax- For /iniax islauds the success probability on fitness level i with 
success probability Si for one island is given by 1 — (1 — Si)^"'"'. Hence, the 
expected time for leaving fitness level i if /Xmax islands are available is at most 
— (1 — Si)^"""]. Now we consider two cases. 
If Si ■ //,nax < 1 we have 1 - (1 - 5^)^™=- >!-(!- s,^i^^^/2) = s,^i^^^/2 
because for all < a;?/ < 1 it holds (1 — a;)** < l — xy/2 [iTJ Lemma 1]. Otherwise, 
if Si ■ /imax > 1 we have 1 - (1 - Si)^""" > 1 - e"'*'^"""= > 1 - -j. Thus, 




m—1 
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Adding the expected waiting times until //max islands are involved yields the 
claimed bound. □ 



In terms of our test functions OneMax, LO, unimodal functions, and Jumpj,, 
this leads to the following result that can be proven like Theorem [T] 

Corollary 3. For the parallel (1+1) EA and the (l+\) EA with Scheme A the 
following holds for a maximum population size /imax' 

• E{T^^) = 0(nlog/imax + r7,logr7,log(/imax)/Aimax) for OucMax, which 
gives O(nloglogri) for /i,„ax = logn, 

. E{Tl^') = 0(nlog ^ log(/i,„ax)/Atmax) for LO, which gives 

0{n log n) for /imax = n, 

• E{T^^^) = 0((ilog//niax + rf'^log(/imax)//^max) for unimodul functions with 
d function values, which gives 0{d\ogn) for /imax = n, 

. EiTD = 0(nlog 

Mm ax H~ log(/ii„ax)//imax) for Jumpj,, which gives 
O(nfclogn) for /imax = n^^^ . 

Note that Corollary [3] has led to an improvement of E [Tj^'') from 0{n log n) 
to O(nloglogn) for //max = logn. This obviously also holds in the setting of 
unrestricted population sizes. 

10 Conclusions 

We have presented two schemes for adapting the offspring population size in 
evolutionary algorithms and, more generally, the number of islands in parallel 
evolutionary algorithms. Both schemes double the population size in each gen- 
eration that does not yield an improvement. Despite the exponential growth, 
the expected sequential optimization time is asymptotically optimal for tight 
/-based partitions. In general, we obtain bounds that arc asymptotically equal 
to upper bounds via the fitness-level method. 

In terms of the parallel computation time expected waiting times can be re- 
placed by their logarithms for both schemes, compared to a serial EA. This yields 
a tremendous speed-up, in particular for functions where finding improvement is 
difficult. Scheme B, doubling or halving the population size in each generation, 
turned out to be more effective than resets to a single island as in Scheme A. 

Apart from our main results, we have introduced the notion of tight /-based 
partitions and new arguments from amortized analysis of algorithms to the 
theory of evolutionary algorithms. 

An open question is how our schemes perform in case the fitness-level method 
does not provide good upper bounds. In this case our bounds may be off from 
the real expected running times. In particular, there may be examples where 
increasing the offspring population size by too much might be detrimental. One 
constructed function where large offspring populations perform badly was pre- 
sented in [9]. Future work could characterize function classes for which our 
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schemes are efficient in comparison to the real expected running times. The 
notion of tight /-based partitions is a first step in this direction. 
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Abstract 

We present two adaptive schemes for dynamically choosing the number 
of parallel instances in parallel evolutionary algorithms. This includes 
the choice of the offspring population size in a (1+A) EA as a special 
case. Our schemes are parameterless and they work in a black-box setting 
where no knowledge on the problem is available. Both schemes double 
the number of instances in case a generation ends without finding an 
improvement. In a successful generation, the first scheme resets the system 
to one instance, while the second scheme halves the number of instances. 
Both schemes provide near-optimal speed-ups in terms of the parallel 
time. We give upper bounds for the asymptotic sequential time (i. e., 
the total number of function evaluations) that are not larger than upper 
bounds for a corresponding non-parallel algorithm derived by the fitness- 
level method. 



1 Introduction 



Parallelization is becoming a more and more important issue for solving difficult 
optimization problems [T]. Various implementations of parallel evolutionary 
algorithms (EAs) have been applied in the past decades [T7] . An obvious way of 
using parallelization is to parallelize single operations of an EA such as executing 
fitness evaluations on different processors. This particularly applies to EAs using 
large offspring populations. So-called island models use parallelization on a 
higher level. The idea is to parallelize evolution itself, by having subpopulations, 
called islands, which evolve in parallel. Good solutions are exchanged between 
the islands in a migration process. 
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One of the most important questions when deahng with parahel EAs is how 
to choose the number of processors in order to decrease the parallel optimization 
time, defined as the number of generations until an EA has found a global 
optimum. Assume a setting where we can choose the number of processors to 
be allocated, but we have to pay costs for each processor in each generation it 
is being used. This situation is common in cloud computing or in large grids 
where processors are shared with other users. The total cost for all processors 
over time is called sequential optimization time. The task is now to choose 
the number of processors to be used such that the parallel optimization time 
is small, but at the same time the sequential time is reasonable. Allocating 
too many processors would waste computational effort and hence unnecessarily 
increase the sequential optimization time. Allocating too few processors implies 
a large parallel optimization time. 

During the run of an EA, the "ideal" value for the number of processors is 
likely to change over time. One typical situation is that in the beginning of a run 
improvements are easy to obtain and only few processors are needed. The better 
the best fitness, the tougher it gets to find further improvements and then more 
processors are required. It therefore makes sense to look at adaptive mechanisms 
that can adjust the number of processors which are being used during the run 
of the EA. This obviously only makes sense in a setting where allocating and 
deallocating processors on-the-fly is possible and the cost for these operations 
and the cost for the communication between the processors are rather small. 
Hence we focus on balancing the parallel and sequential optimization times. 

In the following we present adaptive schemes for choosing the number of 
processors that apply both to offspring populations as well as island models of 
EAs. We accompany our schemes by a rigorous theoretical analysis of their 
running time. Both schemes double the number of processors if the current 
generation fails to produce an offspring that has larger fitness than the current 
best fitness value. Otherwise, if the generation yields an improvement, the 
number of processors is decreased again. The difference between the two schemes 
lies in the way the number of processors is decreased. 

The first scheme, called Scheme A, simply resets the number of processors 
to 1; only the best individual or island survives. This is to avoid an overly 
large number of processors when moving from a situation where improvements 
are hard to find to a situation where improvements are easy. This happens, for 
instance, if the EA escapes from a local optimum and then jumps to the basin 
of attraction of a better local optimum. 

The second scheme. Scheme B, tries to maintain a fair number of processors 
over time; it also doubles the population size in unsuccessful generations and it 
halves the population size in successful generations. This strategy makes more 
sense in case the EA encounters similar probabilities for improvements over time. 
Both schemes are parameterless and oblivious with respect to the objective 
function. They can be applied in a black-box setting where no knowledge is 
available about the problem. 

In terms of offspring populations we consider the (1+A) EA that maintains a 
single best individual and in each iteration creates A offspring. A best offspring 
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replaces its parent if its fitness is not worse. The A offspring creations and 
function evaluations can be parallelized on A processors. Concerning island 
models, we assume that migration sends copies of each island's best individual 
to each other island in every generation. So, whenever one island finds an 
improvement of the current best individual in the system, this is immediately 
communicated to all other islands. The island model then behaves similarly 
to offspring populations, but it is more general as the islands can work with 
populations of size larger than 1. 

To unify the notation for island models and offspring populations, we sim- 
ply speak of the population size in the following; this means the number of 
islands in the island model and the offspring population size for the (1+A) EA, 
respectively. 

For EAs using either Scheme A or B we show that the expected parallel op- 
timization time can be decreased drastically. In comparison to the well-known 
fitness-level method, in the parallel optimization time for every fitness value 
the expected waiting time for an improvement can be replaced by its loga- 
rithm. This can drastically reduce the parallel optimization time, in particular 
for problems where improvements are hard to find. The expected sequential 
time remains reasonable. We prove upper bounds on the expected sequential 
optimization time that are asymptotically no larger than upper bounds for a 
single instance obtained via the fitness-level method. For problems where the 
fitness-level method gives tight bounds, our results show that the two schemes 
automatically yield decreased expected parallel optimization times, without in- 
creasing the expected sequential time. 

The mentioned bounds are general in the sense that they apply to islands 
running arbitrary elitist algorithms. Example applications are given that apply 
simultaneously to the (H-A) EA and to islands of population size 1. Various 
functions are considered: OneMax. LO, the class of unimodal functions and 
Jumpj. . 

Comparing the different schemes, our results indicate that Scheme B is more 
efficient than A, from an asymptotic perspective, as it quickly reduces the num- 
ber of processors, if necessary. This adaptation automatically leads to optimal 
or near-optimal parallel optimization times on all considered examples. On one 
example Scheme B outperforms Scheme A. We also compare these results with 
tailored schemes that are allowed to use knowledge on the objective function. 

Besides the main results this paper is also interesting because of the meth- 
ods used. We introduce new techniques from the amortized analysis of al- 
gorithms, which represent natural and effective tools for analyzing adaptive 
mechanisms. These techniques may find further applications in the analysis of 
adaptive stochastic search algorithms. 

The remainder of this work is structured as follows. In Section [5] we review 
previous work. Section [3] presents the algorithms and the considered population 
update schemes. In Section |3] we provide technical statements that will be used 
later on in our analyses and that may also help to understand the dynamics of 
the adaptive algorithms. Section [5] then presents general upper bounds for both 
schemes, while Section [6] deals with lower bounds on expected sequential times. 
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Section [7] contains a brief discussion about tailored, that is, non-oblivious pop- 
ulation update schemes. Our general theorems are applied to concrete example 
functions in Section |S1 We finish with a discussion of possible extensions in 
Section [S] and conclusions in Section [TUl 

2 Previous Work 

2.1 Adaptive Population Models 

Considering adaptive numbers of islands in the island model of EAs, previous 
work is very limited. However, there are numerous results for adaptive popula- 
tion sizes in EAs. Eiben, Marchiori, and Valko [5] describe EAs with on-the-fly 
population size adjustment. They compared the performance of the different 
strategies in terms of success rate, speed, and solution quality, measured on a 
variety of fitness landscapes. The best EAs with adaptive population resizing 
outperformed traditional approaches when considering the time to result, which 
is the parallel optimization time. Typical approaches are eliminating popula- 
tion size as an explicit parameter by introducing aging and maximum lifetime 
properties for individuals [12], the parameter- less GA (PLGA) which evolves a 
number of populations of different sizes simultaneously [7] , random variation of 
the population size [3], and competition schemes |14] . 

Schwefel [15] first suggested the adaptation of the offspring population size 
during the optimization process. Herdy proposed a mutative adaptation of A 
in a two-level ES, where on the upper level, called population level, A is treated 
as a variable to be optimized while on the lower level, called individual level, 
the object parameters are optimized. 

In [S], a deterministic adaptation scheme for A based on theoretical consid- 
erations on the relation between serial rates of progress for the actual number 
of offspring A, for A — 1 and for the optimal number of offspring is introduced. 
More specific, the local serial progress (i.e., progress per fitness function eval- 
uation) is optimized in a (1,A) EA with respect to the number of offspring A. 
The authors prove the following structural property: the serial progress-rate as 
a function of A is either a function with exact one (local and global) maximum 
or a strictly monotonically increasing function. 

Jansen, De Jong, and Wegener further elaborate on the offspring popu- 
lation size, presenting a thorough runtime analysis of the effects of the offspring 
population size. They also suggest a simple way to dynamically adapt this pa- 
rameter and present empirical results for this scheme, but no theoretical analysis 
of their scheme has been performed. The presented scheme doubles the offspring 
population size if the algorithm is unsuccessful to improve the currently best 
fitness value. Otherwise, it divides the current offspring population size by s, 
where s is the number of offspring with better fitness than the best fitness value 
so far. We will discuss in Section [9] how our schemes relate to their scheme and 
in how far our results can be transferred. 
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2.2 Theoretical Work on Parallel EAs 



In [TU], a first rigorous runtime analysis for island models has been performed 
by constructing a function where alternating phases of independent evolution 
and communication among the islands are essential. A simple island model 
with migration finds a global optimum in polynomial time, while panmictic 
populations as well as island models without migration need exponential time, 
with very high probability. 

New methods for the running time analysis of parallel evolutionary algo- 
rithms with spatially structured populations have been presented in The 
authors generalized the well known fitness-level method, also called method of 
/-based partitions [18] , from panmictic populations to spatially structured evo- 
lutionary algorithms with various migration topologies. These methods were 
applied to estimate the speed-up gained by parallelization in pseudo-Boolean 
optimization. It was shown that the possible speed-up for the parallel optimiza- 
tion time increases with the density of the topology. The expected sequential 
optimization time is asymptotically not larger than an upper bound for a cor- 
responding non-parallel EA, derived via the fitness-level method. 

More precisely, the classical fitness level method says that when Si is a lower 
bound on the probability that one island leaves the current fitness level towards 
a better one, the expected time until this happens is at most 1/si for a panmictic 
population. In a parallel EA with a unidirectional ring, the expected parallel 
time decreases to 0(s^/^); in other words, the waiting time can be replaced by 
its square root. For a torus graph even the third root can be used and with a 
proper choice of the number fi of islands, a speed-up of order fi is possible in 
some settings. 

Interestingly, the results from |llj can partially be interpreted in terms of 
adaptive population sizes. The analyses are based on the number of individuals 
on the current best fitness level. In our upper bounds, we pessimistically assume 
that only islands on the current best fitness level have a reasonable chance of 
finding better fitness levels. All worse individuals are ignored when estimating 
the waiting time for an improvement of the best fitness level. If a unidirectional 
ring topology is used, migration happens in every generation, and better indi- 
viduals are guaranteed to win in the selection step, the number of individuals 
on the current best fitness level increases by 1 in each generation as always a 
new island is taken over. (We pessimistically ignore the fact that islands on 
worse fitness levels can improve their best fitness.) If any island finds an im- 
provement, it is pessimistically assumed that then only one island has made 
it to a new, better fitness level. This setting corresponds exactly to a paral- 
lel EA that in each unsuccessful generation acquires one new processor and to 
an adaptive (1+A) EA that increases A by 1 in each unsuccessful generation. 
Once an improvement is found, the population size drops to 1 as in the case of 
our first scheme presented here. The upper bounds from [llj therefore directly 
transfer to additive population size adjustments. In the following we show that 
multiplicative adjustments of the population size may admit better speed-ups 
than additive approaches as suggested in [TT] . 
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3 Algorithms 



In Sections[5]and[7]we present general upper bounds via the fitness-level method. 
These results are general in the following sense. If all islands in a parallel EA 
run elitist algorithms (i.e., algorithms where the best fitness in the population 
can never decrease) and if we have a lower bound on the probability of finding a 
better fitness level then this can be turned into an upper bound for the expected 
sequential and parallel running times of the parallel EA. 

We present a scheme for algorithms where this argument applies. The goal is 
to maximize some fitness function / in an arbitrary search space. An adaptation 
towards minimization is trivial. 

Algorithm 1 Elitist parallel EA with adaptive population 

1: Let /i := 1 and initialize a single island uniformly at random. 

2: for t 1 to oo do 

3: for all 1 < i < /i in parallel do 

4: Select parents and create offspring by variation. 

5: Send a copy of a fittest offspring to all other islands. 

6: Create P^^i such that it contains a best individual from the union of 

PI, the new offspring, and the incoming migrants. 
7: /it+i := updatePopulationSize(Pj , PjY;^) 

8: if /it+i > i-H then create /it+i — new islands by copying existing 
islands. 

9: if /it+i < then delete /it — fit+i islands. 



The selection of islands to be copied or removed, respectively, is left un- 
specified. Note that each island migrates individuals to all other islands. This 
corresponds to a complete migration topology. Due to this fact, all islands 
always contain an offspring with the current best fitness. This observation is 
sufficient for the upcoming analyses. With other topologies this selection would 
be based on the fitness values of the current elitists on all islands. 

The (1-l-A) EA can be regarded a special case where we have A islands and 
a single best individual takes over all A islands. Setting A := 1 yields the well- 
known (I+l) EA. 



Algorithm 2 (f +A) EA with adaptive population 
1: Initialize a current search point xi uniformly at random. 
2: for t := 1 to oo do 
3: Create A offspring by mutation. 
4: Let X* be an offspring with maximal fitness. 
5: if /(x*) > f{xt) then xt+i := x* else xt+i := xt- 
6: A := updatePopulationSize({a;t}, {xt+i}) 



Note that we have neither specified a search space nor variation operators. 
However, in Section |6] we will discuss lower bounds that only hold in pseudo- 
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Boolean optimization and for EAs that only use standard mutation (i. e., flipping 
each of n bits independently with probability 1/n) for creating new offspring. 

In Section[5]wc will consider concrete example functions where the (1+A) EA 
with adaptive populations or, equivalently, an island model running (1+1) EAs, 
with an adaptive number of islands arc applied. The latter was called parallel 
(1+1) EA in [Him. 

We now define the population update schemes considered in this work. The 
function updatePopulationSize takes the old and the new population as inputs 
and it outputs a new population size. 

In order to help finding improvements that take a long time to be found, we 
double the population size in each unsuccessful generation. As we might not 
need that many islands after a success, we reset the population size to 1. 



Algorithm 3 updatePopulationSizc(Pt, Pt+i) (Scheme A) 

1: if max{/(a;) | x G Pt+i} < max{/(a;) | x G Pf} then 

2: return 

3: else 

4: return 1 



On problems where finding improvements takes a similar amount of time, 
it might not make sense to throw away all islands at once. Especially if im- 
provements have similar probabilities over time, it makes sense to stay close to 
the current number of islands. Therefore, in the following scheme we halve the 
population size with every successful generation. We will see that this does not 
worsen the asymptotic performance compared to Scheme A. For some problems 
this scheme will turn out to be superior. 



Algorithm 4 updatePopulationSize(Pt, Pt+i) (Scheme B) 

1: if max{/(a;) | x £ Pt+i} < max{/(a;) | x G Pt} then 

2: return 2/it 

3: else 

4: return lnt/2\ 



The motivation for considering Scheme A is that we can assess the effect of 
gradually decreasing the population size, when comparing it to Scheme B. It 
also serves as a first step towards analyzing Scheme B, where the analysis turns 
out to be more involved. 

Our schemes for parallel EAs are applicable in large clusters where the cost 
of allocating new processors is low, compared to the computational effort spent 
within the evolutionary algorithm. Many of our results can be easily adapted 
towards algorithms that do not use migration and population size updates in 
every generation, but only every r generations, for a parameter t e IN^, called 
migration interval. This can significantly reduce the costs for allocating and 
deallocating new processors. Details can be found at the end of Section [S) 
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An algorithm using Scheme B can be implemented in a decentralized way 
as follows, where we assume that each island runs on a separate processor. 
Assume all processors are synchronized, i.e., they share a common timer. All 
processors have knowledge on the current best fitness level and they inform all 
other processors by sending messages in case they find a better fitness level. 
This message contains individuals that can be taken over by other processors 
so that all processors work on the current best fitness level. 

In the adaptive scheme, if after one generation no message has been received, 
i. e., no processor has found a better fitness level, each processor activates a new 
processor as follows. Each processor maintains a unique ID. The first processor 
has an ID that simply consists of an empty bit string. Each time a processor 
activates a new processor, it copies its current population and its current ID to 
the new processor. Then it appends a 0-bit to its ID while the new processor 
appends a 1-bit to its ID. At the end, all processors have enlarged their IDs 
by a single bit. When an improvement has been found, all processors first take 
over the genetic material in the messages that are passed. Then all processors 
whose ID ends with a 1-bit shut down. All other processors remove the last bit 
from their IDs. 

It is easy to see that with this mechanism all processors will always have 
pairwise distinct IDs and no central control is needed to acquire and shut down 
processors. 

As mentioned in the introduction, we define the parallel optimization time 
TP'^'' as the number of generations until the first global optimum is evaluated. 
The sequential optimization time T^'^'^ is defined as the number of function 
evaluations until the first global optimum is evaluated. The number of function 
evaluations is a common performance measure and it captures the total effort on 
all processors. Note that this includes all function evaluations in the generation 
of the algorithm in which the improvement is found. These definitions are 
consistent with the measures as suggested in the literature [9] . In both measures 
we allow ourselves to neglect the cost of the initialization as this only adds a 
fixed term to the running times. 

4 Tail Bounds and Expectations 

In preparation for upcoming running time analyses we first prove tail bounds for 
the parallel optimization times in a setting where we are waiting for a specific 
event to happen. This, along with bounds on the expected parallel and sequen- 
tial waiting times, will be useful to prove our main theorems. The tail bounds 
also indicate that the population will not grow too large. In the remainder of 
this paper we abbreviate max{a;, 0} by (a;)^. 

Lemma 1. Assume starting with 2^ islands for some k e Mq and doubling 
the number of islands in each generation. Let T^^^{k,p) denote the random 
parallel time until the first island encounters an event that occurs independently 
on each island and in each generation with probability p. Let T^'^'^(k,p) be the 
corresponding sequential time. Then for every a G Kq 
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1. Pr TP"(fc,p) > ([log(l/p)] + l <cxp(-2"), 

2. Pr [TP'''{k,p) < log(l/p) -k-a]<2- 2~" , 

3. log(l/p) - /s - 3 < E (rP^'-(^,p)) < (log(l/p) - fc)+ + 2, 

^. max{l/p,2'^} < E(r"'=<i(A:,p)) < 2/p + 2^ - 1. 

i?ac/i inequality remains valid if p is replaced by a pessimistic estimation of p 
(i. e., either an upper hound or a lower bound). 

Proof The condition TP''''{k,p) > ([log(l/p)] - fc)++ a + 1 requires that the 
event does not happen on any island in this time period. The number of trials 
in the last generation is at least 2ri°s(i/p)l+a > i/p . 2" for all k e INq. Hence 



Pr 



TP^'ik,p) > ([log(l/p)l + a + < (1 - pY^P'^" 

< exp(-2") . 



For the second statement we assume k < log(l/p) — a as otherwise the claim 
is trivial. A necessary condition for TP^^'{k,p) < log(l/p) — A; — a is that the 
event does happen at least once within in the first log(l/p) — k — a generations. 
This corresponds to at most ^i°s^(i/p)-" 2*-i < 2^°e(^^P^'°' = 1/p ■ 2"" trials. 
If p > 1/2 the claim is trivial as either the probability bound on the right-hand 
side is at least 1 or the time bound is negative, hence we assume p < 1/2. 
Observing that then 1/p ■ 2~" < 2(l/p — 1) ■ 2~", the considered probability is 
bounded by 

1 - (1 - p)2(i/p-i)-2-° < 1 _ exp(-2 • 2-") 

< 1 - (1 - 2 • 2-") = 2-2-" . 

To bound the expectation we observe that the first statement implies 
Pr [rP^'-(fc,p) > (log(l/p) - fc)+ + a + 2j < exp(-2"). Since TP^'(/c,p) is non- 
negative, we have 

oo 

E(rp-(fc,p)) = ^Pr[rp-(fc,p)>i] 

t=i 

< (log(l/p) - fc)+ + 1 

OO 

+ ^Pr [TP-(fc,p)>(log(l/p)-A:)+ 

oo 

< (log(l/p) - fc)+ + 1 + ^ exp(-2") 

< (log(l/p) - fc)+ + 2 



a 
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as the last sum is less than 1. For the lower bound we use that the second 
statement implies Pr [T > log(l/p) — k — a] > 1 — 2 ■ 2~". Hence 

oo 

E(TP"(fc,p)) = ^Pr[rP'^'Xfc,P) > ^] 

log(l/p)-fc-l 

> Pr[TP-'^{k,p)>\og{l/p)-k-a] 

a=2 
log{l/p)-k~l 

a =2 

log(l/p)-fc-2 

= log(l/p) - fc - 2 - Y 

a = l 

> log(l/p) - fc - 3 . 

For the fourth statement consider the islands one-by-one, according to some 
arbitrary ordering. Let T{p) be the random number of sequential trials until an 
event with probability p happens. It is well known that E {T(p)) = l/p. Obvi- 
ously T^'^'^{k,p) > T{p) since the sequential time has to account for all islands 
that are active in one generation. This proves E (r*'°^(fc,p)) > E(r(p)) > l/p. 
The second lower bound 2^ is obvious as at least one generation is needed for a 
success. 

For the upper bound observe that T^'^'^{k,p) = 2*^ in case T{p) < 2^ and 

T'^%k,p) = ELfe2^ in case EtU" < T{p) < Etfc 2'- Together, we get 
that r"'=i(fc,p) < max{2T(p),2''} < 2T{p) + 2'^ - 1, hence E (r"'=i(fc,p)) < 
2/^-^2*^-1. □ 

The presented tail bounds indicate that the population typically does not 
grow too large. The probability that the number of generations exceeds its 
expectation by an additive value of a -I- 1 is even an inverse doubly exponen- 
tial function. The following provides a more handy statement in terms of the 
population size. It follows immediately from Lemma [T] 

Corollary 1. Consider the setting described in Lemma\^ For every /3 > 1, 
/3 a power of 2, the probability that while waiting for the event to happen the 
population size exceeds max{2''+^, 4/p} ■ /3 is at most exp(— /3). 

One conclusion from these findings is that our schemes can be applied in 
practice without risking an overly large blowup of the population size. We now 
turn to performance guarantees in terms of expected parallel and sequential 
running times. 

5 Upper Bounds via Fitness Levels 

The following results are based on the fitness-level method, also known as 
method of /-based partitions (see, e.g., Wegener [T^)- This method is well 
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known for proving upper bounds for algorithms that do not accept worsenings 
of the population. Consider a partition of the search space into sets Ai,. . . , Am 
where for all 1 < i < to — 1 all search points in Ai are strictly worse than all 
search points in Ai+i and Am contains all global optima. If each set Ai contains 
only a single fitness value then the partition is called a canonic partition. 

If Si is a lower bound on the probability of creating a search point in Ai^i U 
• • • U Ami provided the current best search point is in Ai, then the expected 
optimization time is bounded from above by 



where Pr [Ai] abbreviates the probability that the best search point after ini- 
tialization is in Ai. The reason for this bound is that the expected time until 
Ai is left towards a higher fitness level is at most 1/si and each fitness level, 
starting from the initial one, has to be left at most once. Note that we can al- 
ways simplify the above bound by pessimistically assuming that the population 
is initialized in Ai. This removes the term "X)i=i^ P^' [^i] '" ^^^^ ^^^Y leaves 
S^i^ 1/sj. This way of simplifying upper bounds can be used for all results 
presented hereinafter. 

The fitness-level method yields good upper bounds in many cases. This 
includes situations where an evolutionary algorithm typically moves through 
increasing fitness levels, without skipping too many levels [16j . It only gives 
crude upper bounds in case values Si arc dominated by search points from 
which the probability of leaving Ai is much lower than for other search points 
in Ai or if there are levels with difficult local optima (i.e., large values 1/si) 
that are only reached with a small probability. 

Using the expectation bounds from Section|3]we now show in Theorem[TJ For 
both schemes, A and B, in the upper bound for the expected parallel time the 
expected sequential waiting time can be replaced by its logarithm. In addition, 
the expected sequential time is asymptotically not larger than the upper bound 
for the serial algorithm, derived by /-based partitions. 

In the remainder of the paper we denote with T^^" and T^°'^, x G {A, B} the 
parallel time and the sequential time for the schemes A and B, respectively. 

Theorem 1. Given an f -based partition Ai, . . . , Am, 




m— 1 



ra — l 



E{rp)<2Y,VT [A,] 




If the partition is canonic then also 





i=l j=i 
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The reason for the constant 2 in the log(2/sj) term is to ensure that the 
term does not become smaller than 1; with a constant 1 the value Sj = 1 would 
even lead to a summand log(l/sj) ~ 0. 

Proof. We only need to prove asymptotic bounds on the conditional expecta- 
tions when starting in Ai, with a common constant hidden in all O-tcrms. The 
law of total expectation then implies the claim. 

For Scheme A we apply Lemma [1] with fc = 0. This yields that the expected 
sequential time for leaving the current fitness level Aj towards Aj+i U • • • U 
Am is at most 2/sj and the expected parallel time is at most \og{l/sj) + 2 < 
21og(2/sj). The expected sequential time is hence bounded by 2^"^^^l/sj 
and the expected parallel time is at most 2 X^jl^^ log(2/si)- D 

We prove a similar upper bound for Scheme B using arguments from the 
amortized analysis of algorithms [21 Chapter 17]. Amortized analysis is used to 
derive statements on the average running time of an operation or to estimate the 
total costs of a sequence of operations. It is especially useful if some operations 
may be far more costly than others and if expensive operations imply that 
many other operations will be cheap. The basic idea of the so-called accounting 
method is to let all operations pay for the costs of their execution. Operations are 
allowed to pay excess amounts of money to fictional accounts. Other operations 
can then tap this pool of money to pay for their costs. As long as no account 
becomes overdrawn, the total costs of all operations is bounded by the total 
amount of money that has been paid or deposited. 

Theorem 2. Given an f -based partition Ai, . . . , Am, 



Proof. We use the accounting method to bound the expected sequential opti- 
mization time of B as follows. Assume the algorithm being on level j with a 
population size of 2*^'. If the current generation passes without leaving the cur- 
rent fitness level, we pay 2'' to cover the costs for the sequential time in this 
generation. In addition, we pay another 2'^ to a fictional bank account. In 
case the generation is successful in leaving Aj and the previous generation was 
unsuccessful, we just pay 2^^ and do not make a deposit. In case the current gen- 
eration is successful and the last unsuccessful generation was on fitness level j, 
we withdraw 2*° from the bank account to pay for the current generation. In 
other words, the current generation is for free. This way, if there is a sequence 
of successful generations after an unsuccessful one on level j all but the first 
successful generations are for free. 




// the partition is canonic then also 




771—1 



m — 1 
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Let us verify that the bank account cannot be overdrawn. The basic ar- 
gument is that, whenever the population size is decreased from, say, 2*^+^ to 
2^^ then there must be a previous generation where the population size was in- 
creased from 2^ to 2*^+-'^. It is easy to see that associating a decrease with the 
latest increase gives an injective mapping. In simpler terms, the latest genera- 
tion that has increased the population size from 2^ to 2*^+^ has already paid for 
the current decrease to 2^ . 

When in the upper bound for A fitness level i takes sequential time 1 + 2 + 

• . • + 2^^ = 2''+i - 1 then for B the total costs paid are 2(1 + 2 H V 2^-^) + 2^ 

as a successful generation does not make a deposit to the bank account. The 
total costs equal 2''+^ -2-1-2'' < 3/2-(2''+^ -1). In consequence, the total costs 
for Scheme B are at most 3/2 the costs for A in A's upper bound. This proves 
the claimed upper bound for B. 

By the very same argument an upper bound for the expected parallel time 
for B follows. Instead of paying 2^ and maybe making a deposit of 2*^, we always 
pay 1 and always make a deposit of 1. When withdrawing money, we always 
withdraw 1. This proves that also E (Tg^"^) is at most twice the corresponding 
upper bound for Scheme A. □ 

The argument in the above proof can also be used for proving a general upper 
bound for the expected parallel optimization time for B. When paying costs 2 
for each fitness level, this pays for the successful generation with a population 
size of, say, 2'^ and for one future generation where the population size might 
have to be doubled to reach 2^ again. 

Imagine the sequence of population sizes over time and then delete all el- 
ements where the population size has decreased, including the associated gen- 
eration where the population size was increased beforehand. In the remaining 
sequence the population size continually increases until, assuming a global op- 
timum has not been found yet, after n log n generations a population size of 
at least n" is reached. In this case the probability of creating a global opti- 
mum by mutation is at least (1 — n"")" w 1/e as the probability of hitting 
any specific target point in one mutation is at least n~". The expected num- 
ber of generations until this happens is clearly 0(1). We have thus shown the 
following. 

Corollary 2. For every junction with m function values we have E(rg^') < 
2m + nlogn -t- 0(1). 

This bound is asymptotically tight, for instance, for long path problems [U 
I13j . So, the TO-term, in general, cannot be avoided. 

When comparing A and B with respect to the expected parallel time, we 
expect B to perform better if the fitness levels have a similar degree of difficulty. 
This implies that there is a certain target level for the population size. Note, 
however, that such a target level does not exist in case the Si-values are dissim- 
ilar. In the case of similar Si-values A might be forced to spend time doubling 
the population size for each fitness level until the target level has been reached. 
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This waiting time is reflected by the log(2/sj)-ternis in Theorem [TJ The fol- 
lowing upper bound on B shows that these log-terms can be avoided to some 
extent. In the special yet rather common situation that improvements become 
harder with each fitness level, only the biggest such log-term is needed. 

Theorem 3. Given a canonical f -based partition Ai,...,Am, ^{T^^^) is 
bounded by 

m~l / 

Pr [A,] ■ I 3(to - i - 1) + log 
// additionally si > S2 > ■ ■ • > s„i^i then the bound simplifies to 

m — 1 y 

^Pr[^;]- f3(m-2-l) + log 
i=i ^ 

Proof. The second claim immediately follows from the first one as the log-terms 
form a telescoping sum. 

For the first bound we again use arguments from amortized analysis. By 
Lemma[T]if the current population size is 2'^ then the expected number of genera- 
tions until an improvement from level i happens is at most (log(l/si) — k)^ + 2. 
This is a bound of 2 for k > log(l/si). We perform a so-called aggregate anal- 
ysis to estimate the total cost on all fitness levels. These costs arc attributed 
to difii'erent sources. Summing up the costs for all sources will yield a bound on 
the total costs and hence on Tg^' . 

In the first generation the fitness level i* the algorithm starts on pays 
log(l/si*) to the global bank account. Afterwards costs arc assigned as fol- 
lows. Consider a generation on fitness level i with a population size of 2''. 

• If the current generation is successful, we charge cost 2 to the fitness level; 
cost 1 pays for the effort in the generation and cost 1 is deposited on the 
bank account. In addition, each fitness level j that is skipped or reached 
during this improvement pays (log(l/.Sj) — log(l/sj_i))^ as a deposit on 
the bank account. Note that this amount is non- negative and it may be 
non-intcgcr. 

• If fc > log(l/ Si) and the current generation is unsuccessful we charge cost 1 
to the fitness level. 

• If A; < log(l/s,;) and the current generation is unsuccessful we withdraw 
cost 1 from our bank account. 

By Lemma [T] the expected cost charged to fitness level i in unsuccessful genera- 
tions (i.e., not counting the last successful generation) is at most 1. Assuming 
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for the moment that the bank account is never overdrawn, the overah expected 
cost for fitness level i is at most 1 + 2 + (log(l/sj) — log(l/sj_i))^. Adding the 
costs for the initial fitness level yields the claimed bound. 

We use the so-called potential method [U Chapter 17] to show that the bank 
account is never overdrawn. Our claim is that at any point of time there is 
enough money on the bank account to cover the costs of increasing the current 
population size to at least 2'°^^^/'*^^ where j is the current fitness level. We 
construct a potential function indicating the excess money on the bank account 
and show that the potential is always non-negative. 

Let Ht denote the population size in generation t and £t be the (random) 
fitness level in generation t. By bt we denote the account balance on the bank 
account. We prove by induction that 

6, > (log(l/s,J - log(Mt))+ • 

As this bound is always positive, this implies that the account is never over- 
drawn. After the initial fitness level has made its deposit we have foi = 
log(l/sf J — 0. Assume by induction that the bound holds for bt- 

If generation t is unsuccessful and log(/.it) > log(l/s£j) then the population 
size is doubled at no cost for the bank account. As by induction 6t > we have 
bt+i ^bt>0= (log(l/(5,J) - log(Mt+i))+. 

If generation t is unsuccessful and log(/it) < log(l/s^J then the algorithm 
doubles its population size and withdraws 1 from the bank account. As bt is 
positive and log(/it_|-i) log(/it) + 1, wc have 

bt+i 6t - 1 log(l/s£j - logifit) - 1 log(l/s<;J - log(^t+i). 

If generation t is successful and the current fitness level increases from i to some 
j > i, the account balance is increased by 

1+ ^ (log(lK)-log(lK_i))+ 

a— 2+1 

> 1 + (log(l/s,) - log(l/s,))+ . 

This implies 

bt+i >bt + l + (log(l/s,) ~ log(l/s,))+ 

> (log(l/s,) - log(Mt))+ + 1 - log(l/s.) - log(Mt+i) 

> (l0g(l/,Sj) - l0g(Ait))+ + 1 

> (log(l/sj)-log(Ait+i))+ . □ 

The upper bounds in this section can be easily adapted towards parallel 
EAs that do not perform migration and population size adaptation in every 
generation, but only every r generations, for a migration interval r G W. Instead 
of considering the probability of leaving a fitness level in one generation, we 
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simply consider the probability of leaving a fitness level in r generations. This 
is done by considering := 1 — (1 — s,)"^ instead of Si. The resulting time 
bounds, based on s'j^, . . . , Sm_i, arc then with respect to the number of periods 
of r generations. To get bounds on our original measures of time, we just 
multiply all bounds by a factor of r. 

6 Lower Bounds for the Sequential Time 

In order to prove lower bounds for the expected sequential time we make use of 
recent results by Sudholt [12]. He presented a new lower-bound method based 
on fitness-level arguments. If it is unlikely that many fitness levels are skipped 
when leaving the current fitness-level set then good lower bounds can be shown. 

The lower bound applies to every algorithm A in pseudo-Boolean optimiza- 
tion that only uses standard mutations (i.e., flipping each bit independently 
with probability 1 /n) to create new offspring. Such an EA is called a mutation- 
based EA. More precisely, every mutation-based EA A works as follows. First, 
A creates /j, search points xi, . . . , cc^ uniformly at random. Then it repeats the 
following loop. A counter t counts the number of function evaluations; after 
initialization we have t = fi. In one iteration of the loop the algorithm first 
selects one out of all search points xi, . . . ,Xt that have been created so far. This 
decision is based on the fitness values f{xi), . . . ,f{xt) and, possibly, also the 
time index t. It performs a standard mutation of this search point, creating an 
offspring Xt+i- 

To make this work self-contained, we cite (a slightly simplified version of) 
the result here. The performance measure considered is the number of function 
evaluations. This can be assumed to coincide with the number of offspring 
creations as every offspring needs to evaluated exactly once. 

Theorem 4 ([16|). Consider a partition of the search space into non-empty sets 
j4i, . . . , Am such that only Am contains global optima. For a mutation-based EA 
A we say that A is in Ai or on level i if the best individual created so far is 
in Ai. Let the probability of traversing from level i to level j in one mutation 
be at most Ui ■ ji^j and "fi,j ~ 1- Assume that for all j > i and some 

< X < 1 it holds 7ij- > xX]fcLj7j,fc- Then the expected number of function 
evaluations of A on f is at least 



All population update schemes are compatible with this framework; every 
parallel mutation-based EA using an arbitrary population update scheme is still 
a mutation-based EA. Offspring creations are performed in parallel in our algo- 
rithms, but one can imagine these operations to be performed sequentially. We 
can cast a parallel EA with parallel offspring creations as a sequential mutation- 
based EA that simulates the population management of an island model in the 




m— 1 



m — 1 
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background. Recall that the selection in the notion of a mutation-based EA can 
be based on the time index t. Hence, a sequential mutation-based EA can keep 
track of the times when individuals on a specific island have been created or 
when individuals have immigrated from a different island. The algorithm can 
then simulate offspring creations for an island by allowing only individuals on 
the island to become parents. There is one caveat: the parent selection mech- 
anism in |16) does not account for possibly randomized decisions made during 
migration. However, the proof of Theorem 2] goes through in case additional 
knowledge is used. 

We introduce the notion of tight fitness levels, where the success probabilities 
Si from the classical fitness-level method are exact up to a constant factor. 

Definition 1. Call an f -based partition Ai, . . . , (asymptotically) tight for 
an algorithm A if there exist constants c > 1 > x > and values ^ij for 
1 !i *i J < ™ such that for each population in Ai the following holds. 

1. The probability of generating a population in Ai^i U • • • U Am in one mu- 
tation is at least s;. 

2. The probability of generating a population in Aj in one mutation, j > i, 
is at most c ■ Si ■ "fij . 

3. For the jij -values it holds that X]j=i+i 7ij' ~ 1 '^'^^ li-j — xX]feLj7i,fc 
for all i < j ■ 

Tight /-based partitions imply that the standard upper bound by /-based 
partitions [18] is asymptotically tight. This holds for all elitist mutation-based 
algorithms, that is, mutation-based algorithms where the best fitness value in 
the population can never decrease. 

Theorem 5. Consider an algorithm A with an arbitrary population update 
strategy that only uses standard mutations for creating new offspring. Given a 
tight f -based partition Ai, . . . , A„i for a function f , we have 



Proof. The lower bound on E (T^'^'^) follows by a direct application of TheoremSl 
We already discussed that this theorem applies to all algorithms considered in 
this work. Setting uj :~ csj for all I < j < m, c and x being as in Definition [1] 
Theorem 2] implies 



This lower bound shows that for tight /-based partitions both our population 
update schemes produce asymptotically optimal results in terms of the expected 
sequential optimization time, assuming no cost of communications. 




m — 1 m — 1 



E (T'^'^q) > £ Pr [AA ■ y — 



As both, X ^i^d c, are constants, this implies the claim. 



□ 
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7 Non-oblivious Update Schemes 



Wc also briefly discuss update schemes that are tailored towards particular 
functions, in order to judge the performance of our oblivious update schemes. 

Non-oblivious population update schemes may allow for smaller upper 
bounds for the expected parallel time than the ones seen so far. When the 
population update scheme has complete knowledge on the function / and the 
/-based partition, an upper bound can be shown where each fitness level con- 
tributes only a constant to the expected parallel time. By T^'^'^ and T^^'^ we 
denote the sequential and parallel times of the considered non-oblivious scheme. 

Theorem 6. Given an arbitrary f -based partition Ai, . . . , there is a tai- 
lored population update scheme for which 



In particular, E (TP^'') = 0(m). 

Proof. The update scheme chooses to use [l/sj] islands if the algorithm is in 
Ai . Then the probability of finding an improvement in one generation is at least 
1 — (1 — Si)^/'*' > 1 — 1/e. The expected parallel time until this happens is at 
most e/(e — 1) and so the expected sequential time is at most e/(e — 1) ■ [l/sj] < 
2e/(e — 1) • 1/si. Summing up these expectations for all fitness levels from i 
to m — 1 proves the two bounds. □ 

In some situations it is possible to design schemes that perform even better 
than the above bound suggests. For instance, for trap functions the best strategy 
would be to use a very large population in the first generation so that the 
optimum is found with high probability, and before the algorithm is tricked to 
increasing the distance to the global optimum. 

8 Bounds for Example Functions 

The previous bounds are applicable in a very general context, with arbitrary 
fitness functions. We also give results for selected example functions to estimate 
possible speed-ups in more concrete settings. 

We consider the same example functions and function classes that have been 
investigated in jllj . The goal is the maximization of a pseudo-Boolean func- 
tion /: {0,1}" —7- R. For a search point x G {0,1}" write x = xi...Xn, 
then OneMax(a;) := X]r=i counts the number of ones in x and LO(.t) := 




and 




m — i — I) 
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Sr=i 11^=1 counts the number of leading ones in x. A function is called uni- 
modal if every non-optimal search point has a Hamming neighbor (i. e., a point 
with Hamming distance 1 to it) with strictly larger fitness. For 1 < k < n we 
also consider 

Jump , •= 1^ ^ ^^=1 ^^=1 ^' < " - or X = 1" , 

I ^"^j^ (1 — Xi) otherwise. 

This function has been introduced by Droste, Jansen, and Wegener [4] as a func- 
tion with tunable difficulty. Evolutionary algorithms typically have to perform 
a jump to overcome a gap by flipping k specific bits. 

For these functions we obtain bounds for T^°'^ and TP'^'' as summarized in 
Table[TJ The lower bounds for E (T'"''^) on OneMax and LO follow directly from 
[16] for all schemes. 





Scheme 




E (TP'''') 


OneMax 


A 


B(n log n) 


0{n logn) 




B 


Q{n log n) 


0{n) 




non-oblivious 


0(71 log n) 


0(n) 


LO 


A 


O(n^) 


Q(n log n) 




B 




0{n) 




non-oblivious 




0{n) 


unimodal / 


A 


0{dn) 


O(dlogn) 


with d /-values 


B 


0{dn) 


0(d + logn) 




non-oblivious 


0{dn) 


0(d) 


Jumpj. 


A 


0{n'=) 


0(n log n) 


with k>2 


B 


0{n^) 


0{n + k log n) 




non-oblivious 


0{n'^) 


0{n) 



Table 1: Asymptotic bounds for expected parallel running times E (TP^'') and 
expected sequential running times E (T^°'^) for the parallel (1+1) EA and the 
(1+A) EA with adaptive population models. 



Theorem 7. For the parallel (1+1) EA and the (l+\) EA with adaptive popu- 
lation models the upper hounds for E{T^'^'^) and E{T^'^'^) hold as given in Table 

m 

Proof. The upper bounds for Scheme A follow from Theorem [TJ for Scheme B 
from Theorems [5] and [3] and for the non-oblivious scheme from Theorem [Bl 
Starting pessimistically from the first fitness level, the following bounds hold: 

• For OneMax we use the canonical /-based partition Ai := {x \ 
OneMax(x) = i} and the corresponding success probabilities Si > {n — 
i)ln ■ (1 - l/n)"-i >{n- i)/{en). Hence, E{Tr) < 2^:=^ log(|£^) < 
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2nlog(2en) = O(nlogn), 

n — 1 w n—1 



n ^ 

= 2eny^ - = 2en • [(Inn) + 1] , 



i=l 



-B(rP") < (3(n - 2) + log(2en)) = 0(n) and £;(r^'='') < Sen ■ [(Inn) + 1], 
E{TSo ) = Cl(n) and £;(T,^S'i) = O(nlogn). 

• For LO wc use the canonical /-based partition Ai := {x \ LO(a-) = i} and 
the corresponding success probabilities Si > 1/n ■ (1 — l/n)"~^ > l/(en). 
Hence, i;(rX") < 2 ^"J^^ log(2en) = 2nlog(2en) = O(nlogn), 

n — 1 ^ n — 1 

.2 



EiTr)<2j2^<2T. 



_ en = 2en 

Si ^ — ' 

i=0 i=0 

EiTD < (3(n- 2) +log(en)) = 0(n), E{TP) < Sen\ E{TP-^-) = 0{n) 
and E{T^^l'i) = 0{n^). 

• For unimodal functions with d function values we use corresponding suc- 
cess probabilities Si > l/(en). Hence, E(T^'^'^) < 2^f~^\og{2en) < 
2c!log(2en) = 0{dn), 



d-l ^ d-1 

E{T]^'^) <2^ — <2^en = 2edn , 

1=1 i=i 

E{T^n < Hd - 2) + log(en) = 0{d + logn), E{Tp) = 3edn, E{TP-') = 
0{d) and EiTZ"^) = 0{dn). 

• For Jumpj. functions with k > 2 and all individuals having neither n — k 
nor n 1-bits, an improvement is found by either increasing or decreasing 
the number of 1-bits. This corresponds to optimizing OneMax. In order 
to improve a solution with n — k 1-bits, a specific bit string with Hamming 
distance k has to be created, which has probability Sn-k at least 



n J ~ \ n J V n ^ 



k 



Hence, E{Tl^') < O(nlogn) + 21og(en'=) < O(nlogn) + 2fclog(en) = 
O(nlogn), E{TP) < Oin''), EiT^""') < 0(n)-hfc log(en) = 0(n+fclogn), 
EiT^""^) < 0{n^), E{TP^'-) = 0(n) and EiT^l'i) = 0{n^). □ 

It can be seen from Table [T] that both our schemes lead to significant speed- 
ups in terms of the parallel time. The speed-ups increase with the difficulty of 
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the function. This becomes obvious when comparing the results on OneMax 
and LO and it is even more visible for Jumpj,. 

The upper bounds for E {T^'^'^) are always asymptotically lower than those 
for E {T^'^^), except for Jump^. with k = Q{n). However, without corresponding 
lower bounds we cannot say whether this is due to differences in the real run- 
ning times or whether we simply proved tighter guarantees for B. We therefore 
consider the function LO in more detail and prove a lower bound for A. This 
demonstrates that Scheme B can be asymptotically better than Scheme A on a 
concrete problem. 

Theorem 8. For the parallel (1+1) EA and the (1+)^) EA with adaptive pop- 
ulation models on LO we have E(r^'*') = ^l{n\ogn). 

Proof. We consider a pessimistic setting (pessimistic for proving a lower bound) 
where an improvement has probability exactly 1/n. This ignores that all lead- 
ing ones have to be conserved in order to increase the best LO-value. We show 
that with probability ri(l) at least n/30 improvements are needed in this set- 
ting. As by Lemma [T] the expected waiting time for an improvement is at least 
max{0, (logn) — 3}, the conditional expected parallel time is r2(nlogn). By the 
law of total expectation, also the unconditional expected parallel time is then 
r2(nlogri). 

Let us bound the expected increase in the number of leading ones on one 
fitness level. Let Tf^^ denote the random number of generations until the best 
fitness increases when the algorithm is on fitness level i. By the law of total 
expectation the expected increase in the best fitness in this generation equals 

oo 

^ Pr [7;P"'- = . E (LO-increase | Tf^' = t) . (1) 
t=i 

The expected increase in the number of leading ones can be estimated as follows. 
With Tf^^ = t the number of mutations in the successful generation is 2*~^. Let 
/ denote the number of mutations that increase the current best LO-value. A 
well-known property of LO is that when the current best fitness is i then the 
bits at positions i -I- 2, . . . , n are uniform. Bits that form part of the leading ones 
after an improvement are called free riders. The probability of having k free 
riders is thus 2"^^ (unless the end of the bit string is reached) and the expected 
number of free riders is at most X^fclo 2"^^ ~ ^■ 

The uniformity of "random" bits at positions i-|-2, . . . , n holds after any spe- 
cific number of mutations and in particular after the mutations in generation 
Tf^"^ have been performed. However, when looking at multiple improvements, 
the free-rider events arc not necessarily independent as the "random" bits are 
very likely to be correlated. The following reasoning avoids these possible de- 
pendencies. We consider the improvements in generation Tf^^ one-by-one. If 
Fi denotes the random number of free riders gained in the first improvement, 
when considering the second improvement the bits at positions i -I- 3 -I- Fi, . . . , n 
are still uniform. In some sense, we give away the free riders from a fitness im- 
provements for free for all following improvements. This leads to an estimation 
of 1 -|- Fi for the gain in the number of leading ones. 
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Iterating this argument, the expected total number of leading ones gained is 
thus bounded by 21, the expectation being taken for the randomness of free rid- 
ers. Also considering the expectation for the random number of improvements 
yields the bound 2E (/ | J > 1) as / has been defined with respect to the last 
(i.e. successful) generation. We also observe E (/ | / > 1) < 1+E(/) < 1+2*^/12. 
Plugging this into Equation ((!]) yields 

oo 

^Pr = t]-{2 + 2'+^/n) 

oo 

= 2 + 2 ^ Pr [TP"' = t + 1] • 2*+ Vn 
t=o 

oo 

<2 + 2^Pr [7;P" > • 2*+Vn 
t=o 

[log"! oo 

<2 + 2 ^ 2*+Vn + 2 Pi-[T.r >t]-2'^^/n. 

t = t=[log7ll+l 

The first sum is at most 16. Using Lemma [1] to estimate the second sum, we 
arrive at the lower bound 

oo 

18 + 2 ^ Pr [T^P^' > [logn] + a + 1] • 2r'°s"l+"+2/„ 

0=0 

oo 

< 18 + 2 ^ exp(2-") • 2n°g"l+"+2/ji 

0=0 
oo 

< 18+16-^exp(2-")-2" 

< 29.8 . 

With probability 1/2 the algorithm starts with no leading ones, independently 
from all following events. The expected number of leading ones after n/30 
improvements is at most 29.8/30 • n. By Markov's inequality the probability of 
having created n leading ones is thus at most 29.8/30 and so with probability 
1/2 • 0.2/30 = ri(l) having n/30 improvements is not enough to find a global 
optimum. □ 

9 Generalizations & Extensions 

We finally discuss generalizations and extensions of our results. 

One interesting question is in how far our results change if the population is 
not doubled or halved, but instead multiplied or divided by some other value b > 
1. Then the results would change as follows. With some potential adjustments to 
constant factors, the log-terms in the parallel optimization times in Theorems [H 
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[5] and [3] would have to be replaced by log^. For the sequential optimization times 
stated in these theorems one would need to multiply these bounds by 6/2. This 
means that a larger b would further decrease the parallel optimization times at 
the expense of a larger sequential optimization time. 

Our analyses can also be transferred towards the adaptive scheme presented 
by Jansen, De Jong, and Wegener [H]. Recall that in their scheme the population 
size is divided by the number of successes. In case of one success the population 
size remains unchanged. This only affects the constant factors in our upper 
bounds. When the number of successes is large, the population size might 
decrease quickly. In most cases, however, the number of successes will be rather 
small; for instance, the lower bound for LO, Theorem [51 has shown that the 
expected number of successes in a successful generation is constant. However, 
it might be possible that after a difficult fitness level an easier fitness level is 
reached and then the number of successes might be much higher. In an extreme 
case their scheme can decrease the population size like Scheme A. In some sense, 
their scheme is somewhat "in between" A and B. With a slight adaptation of the 
constants, the upper bound for Scheme A from Theorem [T] can be transferred 
to their scheme. 

Another extension of the results above is towards maximum population sizes. 
Although we have argued in Section 0] that the population size does not blow up 
too much, in practice the maximum number of processors might be limited. The 
following theorem about E{T^^'^) for maximum population sizes can be proven 
by applying arguments from jll) . 

Theorem 9. The expected parallel optimization time of Scheme A for a maxi- 
mum population size /i := /imax > 1 is bounded by 



Proof. We pessimistically estimate the expected parallel time by the time until 
the population consists of /Xmax islands plus the expected optimization time 
if Mmax islands are available. The time until //max islands are involved is 
log //max on one fitness level. Hence, summing up all levels pessimistically gives 
mlog/Zniax- For /iniax islauds the success probability on fitness level i with 
success probability Si for one island is given by 1 — (1 — Si)^"'"'. Hence, the 
expected time for leaving fitness level i if /Xmax islands are available is at most 
— (1 — Si)^"""]. Now we consider two cases. 
If ■ //,nax < 1 we have 1 - (1 - 5^)^™=- >!-(!- s,^i^^^/2) = s,^i^^^/2 
because for all < a;?/ < 1 it holds (1 — a;)** < l — xy/2 [iTJ Lemma 1]. Otherwise, 
if Si ■ /imax > 1 we have 1 - (1 - Si)^""" > 1 - e"'*'^"""= > 1 - -j. Thus, 
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Adding the expected waiting times until //max islands are involved yields the 
claimed bound. □ 



In terms of our test functions OneMax, LO, unimodal functions, and Jumpj,, 
this leads to the following result that can be proven like Theorem [T] 

Corollary 3. For the parallel (1+1) EA and the (l+\) EA with Scheme A the 
following holds for a maximum population size /i := /imax > I- 

• E(Tp^^) — 0(ri, log /i,„ax + "■log(7T.)//i,„ax) for OneMax, which gives 
O(nloglogri) for /i,„ax = logn, 

• E(Tp^^) ~ 0(nlog/iniax + ^^^/Mmax) for LO, which gives 0{n\ogri) for 

/^max ^ ? 

• E(Tp^^) = 0{d log /^max+'^'^/Aimax) for unimodal functions with d function 
values, which gives 0{d\ogn) for /i,nax = n, 

• E{Tp^'') = 0(n log /iniax + / IJ-i-na.^) for Jumpj,, which gives 0{nk\ogn) 
for /imax = n''-'^. 

Note that Corollary [3] has led to an improvement of E {T^^") from 0{n log n) 
to O(nloglogn) for /tmax = logn. This obviously also holds in the setting of 
unrestricted population sizes. 

10 Conclusions 

We have presented two schemes for adapting the offspring population size in 
evolutionary algorithms and, more generally, the number of islands in parallel 
evolutionary algorithms. Both schemes double the population size in each gen- 
eration that does not yield an improvement. Despite the exponential growth, 
the expected sequential optimization time is asymptotically optimal for tight 
/-based partitions. In general, we obtain bounds that arc asymptotically equal 
to upper bounds via the fitness-level method. 

In terms of the parallel computation time expected waiting times on a fitness 
level can be replaced by their logarithms for both schemes, compared to a serial 
EA. This yields a tremendous speed-up, in particular for functions where finding 
improvements is difficult. Scheme B, doubling or halving the population size in 
each generation, turned out to be more effective than resets to a single island 
as in Scheme A. This is because B can quickly decrease the population size if 
necessary. The effort spent while this happens docs not affect the asymptotic 
bounds for expected parallel and sequential times. 

Apart from our main results, we have introduced the notion of tight /-based 
partitions and new arguments from amortized analysis of algorithms to the 
theory of evolutionary algorithms. 

An open question is how our schemes perform in situations where the fitness- 
level method does not provide good upper bounds. In this case our bounds 
may be off from the real expected running times. In particular, there may 
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be examples where increasing the offspring population size by too much might 
be detrimental. One constructed function where large offspring populations 
perform badly was presented in [5]. Future work could characterize function 
classes for which our schemes are efficient in comparison to the real expected 
running times. The notion of tight /-based partitions is a first step in this 
direction. 
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